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Executive summary 


Mental health is a vital component of people’s broader well- 
being... 


Mental health plays a central role in people’s lives and is intrinsically tied to many 
other aspects of people’s wider well-being. This was underscored during the COVID-19 
pandemic, when direct health impacts and loss of lives combined with social isolation, 
loss of work and financial insecurity all contributed to a significant worsening of 
people’s mental health. Data from 15 OECD countries suggest that by late 2020, over 
one-quarter of people experienced symptoms of depression or anxiety. Already, well 
before the pandemic hit, it was estimated that half of the population will experience a 


mental health condition at least once in their lifetime and the economic costs of 
mental ill-health amounted to more than 4% of GDP annually. Good mental health, on 
the other hand, can boost people’s resilience to stress, help them realise their goals 
and actively contribute to their communities. Positive mental health, or having high 
levels of emotional and psychological well-being, is also increasingly being recognised 
as policy target in its own right by health and other government agencies across the 
OECD. 


...dut guidance on how to best monitor it at the population 
level is lacking 


It is essential for governments interested in improving mental health to monitor 
outcomes for both ill-health and positive mental health at the broader population 
level. Statistics that only consider people diagnosed or treated by health care 
professionals are strongly affected by how accessible and developed a country’s 
health care system is, and identifying at-risk groups early on requires tracking 
outcomes well before a person engages with health care services. Moreover, good 
mental health is a foundational asset for the population, and as such, is valuable to 
track in its own right. Successful mental health promotion strategies also require 
understanding of how broader risk and resilience factors, such as people’s material 
conditions, quality of life and social relationships (and inequalities in these), impact 
their mental health. Data on these topics are typically collected in (social) population 
survey Statistics that can be expanded to include mental health outcomes, to support 
this greater understanding and provide a better evidence base for policy. 


Internationally, data on population-wide mental health outcomes are increasingly 
available but remain infrequently collected and poorly harmonised across countries. 
Several of the population mental health statistics the OECD is regularly publishing in 
its long-standing effort to promote a society-wide response to improving mental 
health are only available on a regular basis for a subset of OECD countries, are more 
than five years old at the time of publication for several countries, and in some cases 
stem from non-official data sources. 


How is this report intended to be used? 


This report aims to support national statistical offices and other data producers in 
collecting high-quality measures of population mental health outcomes in a more 
frequent, consistent and internationally harmonised manner. The OECD took stock of 
what member countries are already doing in this area with a questionnaire that was 
shared with the OECD Committee on Statistics and Statistical Policy in February 2022. 
Almost all national statistical offices, and in many cases also health agencies, 
reported back. The report documents existing measurement practice to identify where 
countries are converging when it comes to gathering population mental health 
outcomes, and where gaps remain. In addition, available measurement tools are 
assessed to provide recommendations for priority measures official data producers 
can adopt in household, health and social surveys. 


Key messages and recommendations 


° Collecting data on both mental ill-health and positive mental health at 
the population level would yield a more complete picture of mental health. 
Integrating relevant questions in population surveys that include information on other 
aspects of people’s lives would help better understand the drivers and policy levers 
associated with improving mental health outcomes. This can provide new avenues for 


proactive rather than reactive policy design, and mental health strategies that both 
reduce ill-health and promote good emotional and psychological well-being. 

° The pandemic has spurred new efforts in mental health data collection, 
and it will be important to keep up the momentum. Since March 2020, 7 in 10 
OECD countries added mental health modules to existing surveys or launched new 
mental health surveys, many of them administered multiple times per year, or even 
more frequently. It is unclear whether these will continue in the future. A return to 
business as usual prior to the pandemic would mean that half of OECD countries only 
collect mental health data every four to ten years. 


° Some aspects of mental health are more frequently captured than 
others, and there is scope for better cross-country harmonisation of 
measures. This is in particular reflected by lack of harmonisation when it comes to 
measuring symptoms of anxiety, affect and eudaimonia (i.e. a sense of meaning and 
purpose in life), and very uneven use of tools that assess specific mental health 
conditions beyond anxiety and depressive disorders. 

° The report suggests adding four specific tools in relevant population 
surveys to build a small set of more internationally harmonised population 
mental health indicators. These recommendations have been formed based on a 
comparative assessment of their statistical quality, their response burden and cost, 
and existing data collection practices. They do not imply the phasing out of other 
tools that OECD countries are already using to capture population mental health 
outcomes. 


fe) Mental! ill-health - priority recommendation: The Patient Health 
Questionnaire-4 (PHQ-4) could be included in more frequent surveys, alongside the 
regular collection of the PHQ-8 or PHQ-9 in health surveys. It covers symptoms of both 
depression and anxiety, and does so with only four questions. 


fe) Positive mental health - recommendation: Based on trends in country 
measurement practice, either the WHO-5 or SWEMWBS could be used to measure 
affective and eudaimonic aspects of positive mental health in a comparative way. The 
topic of measuring affect and eudaimonia specifically will continue to be explored in 
future OECD work on Subjective well-being. 

fe) General mental health status - recommendation: Similar to commonly 
used questions that ask respondents to rate their physical health, a single question 
about a respondent’s general mental health status could be included in a range of 
surveys across a country’s broader data infrastructure system. Over half of countries 
include such questions already, though question wording varies widely. The following 
framing has been adopted by at least three OECD countries: “In general, how is your 
mental health? Excellent / Very good / Good / Fair / Poor.” 


1. What is population mental health and why 
Should we measure it? 


Abstract 


Mental health is a vital component of people’s well-being, and measuring it is 
essential to monitor what ultimately matters to people. The aim of this report is to 
encourage Official data producers to collect data on population mental health 
outcomes more frequently and in an internationally harmonised manner. Considering 
all aspects of mental health, ranging from mental ill-health (which may or may not 


include a diagnosed mental health condition) to positive mental states, can provide 
new avenues for a proactive rather than reactive design of mental health systems and 
services, and it can open up space for policy to focus on both reducing illness and 
promoting people’s flourishing. Collecting data on mental ill-health and positive 
mental health in household, social and health surveys would yield a more complete 
picture of mental health and help to better understand the drivers and policy levers 
for improving it. 


Good mental health is a vital component of people’s well-being. Good mental health 
enables individuals to realise their own potential, cope with the stresses of life, work 
productively and make a positive contribution to their communities (World Health 
Organization, 2019[1]). Mental ill-health on the other hand accounts for one of the 
largest and fastest-growing categories of the burden of disease worldwide; its 
economic costs, including investment in the mental health system and lower 
employment and productivity, are estimated at more than 4% of GDP in OECD 
countries (Rehm and Shield, 2019[2]; OECD, 2021[3]). In 2016, already well before 
the COVID-19 pandemic, deaths of despair (due to suicide, acute alcohol abuse or 
drug overdose) were one of the largest causes of preventable deaths in OECD 
countries, six times higher on average than deaths due to homicide, and three times 
higher than road deaths (OECD, 2020[4]). Over the period 2016-18, one in eight 
people living in OECD countries experienced more negative than positive emotions 
during a typical day (OECD, 2020[4]). 


Mental health has come to the forefront of the public debate during COVID-19. 
Besides the direct effect of the pandemic in terms of the high number of lives lost, 
social isolation, loss of work and financial insecurity all led to a significant worsening 
of people’s mental health, with more than a quarter of people in 15 OECD countries 
experiencing symptoms of anxiety or depression by late 2020 (OECD, 2021[3]; OECD, 
2021[5]). Populations living in vulnerable situations, including women, young people, 
those in precarious employment and financial situations, racial and ethnic minorities, 
and people living with existing mental health conditions and substance use disorders, 
have been particularly affected. 


While it is clear that mental health matters for people’s well-being, and that 
substantial parts of the population are living with and affected by mental ill-health, 
discussion so far have not focused sufficiently on how governments should best 
monitor it at the broader population level, and on how to consider both mental ill- 
health and positive mental states. This also requires a conversation about what 
exactly is meant by “mental health” and about which outcomes are most relevant for 
policy makers responsible for treatment, prevention and promotion strategies. 


This chapter first makes the case for why regular measures of population-level mental 
health outcomes should be collected. It then presents how different components of 
mental health, including mental ill-health and positive mental health, have been 
distinguished in research and practice.1 This provides the basis for a common 
understanding and terminology used throughout this report, including in the 
subsequent chapters on available measurement tools and current measurement 
activities in OECD countries (Chapter 2) and on what is known about their statistical 
quality and measurement practice (Chapter 3). 


The importance of focusing on population mental health 
outcomes 


The OECD has a long record of collating international health statistics and promoting a 
society-wide response to improving mental health. This includes the 2015 OECD 
Recommendation on Integrated Mental Health, Skills and Work Policy and its follow-up 
report, Fitter Minds, Fitter Jobs, as well as the recent publication A New Benchmark for 
Mental Health Systems, which sets out a framework for understanding mental health 
performance and assesses whether countries are delivering the policies and services 
that matter for health system performance (OECD, 2021[3]; OECD, 2015[6]; OECD, 
2021[7]). Preventing mental illness, promoting mental well-being and taking a 
multisectoral approach to mental health are amongst the key principles of the 

OECD’s New Benchmark framework, and a number of population-level outcomes 
indicators are included under these principles (life satisfaction, suicide rate and 
inequalities in mental distress by education and employment status). In addition, the 
OECD How’s Life? reports (which assess well-being, inequality and sustainability in 
over 40 member and partner countries, see Box 1.1) also feature a range of outcome 
indicators relevant to mental health. However, several of these are produced 
irregularly, only cover a subset of OECD countries and in some cases are drawn from 
non-official sources.2 


Box 1.1. Measuring people’s well-being 


The OECD Well-being Framework is a broad outcome-focused tool to measure human 
and societal conditions and assess whether life as a whole is getting better for people 
living in OECD countries (OECD, 2020[4]). It includes both current well-being in the 
“here and now”, which focuses on living conditions at the individual, household and 
community levels, and systemic resources needed to sustain well-being in the 
future.3 The Well-being Framework underpins the OECD How’s Life? report series and 
a wide range of other OECD work related to well-being (for an overview, 


see https://www.oecd.org/wise/). 
Figure 1.1. The OECD Well-being Framework 
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Source: OECD (2020[4]), How’s Life? 2020: Measuring Well-being, OECD Publishing, 
Paris, https://doi.org/10.1787/23089679. 


Mental health is not explicitly identified as a separate dimension of well-being in the 
framework, but mental health outcomes are relevant to several dimensions: 


° First, and foremost, the broad “health” dimension of the Framework 
encompasses both mental and physical health. For example, two indicators of mental ill- 
health (deaths from suicide, acute alcohol abuse and drug overdose, and the share of 
people at risk of depression) were included in the OECD How’s Life? 2020 report under 
this dimension. 

° Second, the “subjective well-being” dimension encompasses elements of good 
psychological functioning, notably eudaimonia and positive and negative affect. People’s 
own evaluation of their lives (e.g. life satisfaction) is also included here. 

° Last, “Human capital”, included under resources for future well-being, refers to 
“the knowledge, competencies, skills and health status of individuals, which are viewed 
here from the perspective of their contribution to future well-being” and includes 
indicators such as premature mortality and obesity prevalence (OECD, 2020[4]).4 


In addition, several aspects of positive functioning that are often included in broad 
definitions of positive mental health, such as social connections, financial security 
(income and wealth), and knowledge and skills, are captured by separate dimensions 
within the OECD Well-being Framework. 


The aim of this report is to encourage official data producers to collect data on 
population-level mental health outcomes more frequently and in an internationally 
harmonised manner. This is in line with a well-being approach that assesses what 
ultimately matters to people themselves and their capabilities to live a life of their 
choosing (in this case, feeling mentally healthy and free of mental distress) (OECD, 
2020[4]). Moreover, several well-being drivers measured by more frequently available 


input or output indicators may be imperfectly correlated with such outcomes (e.g. 
mental health expenditure is a poor proxy of mental health status if the health care 
system is inaccessible; similarly, the number of drugs prescribed says little about 
people’s (mental) health conditions) (OECD, 2011[8]). 


Collecting data on mental health status for the entire population, rather than only for 
people diagnosed or treated by health care professionals, is important for a number of 
reasons. First, measures focusing on the numbers diagnosed might only reflect how 
accessible and developed a country’s health care system is, and how likely people 
(and certain population groups) are to seek treatment. Second, strategies to prevent 
mental ill-health would benefit from identifying at-risk groups early on. So, they 
necessitate tracking outcomes prior to, and following, engagement with the health 
system. Third, positive mental health is a foundational asset for the population, and as 
such, is valuable to track in its own right Linking mental health with the broader risk 
and resilience factors typically also collected in population (survey) statistics, such as 
people’s material conditions, quality of life and social relationships (and inequalities in 
these), can equally support mental health strategies. 


Concepts of mental health: From illness to wellness 


Previous OECD work on mental health has adopted the widely accepted definition of 
mental health by the World Health Organisation (WHO): “a state of well-being in which 
the individual realises his or her abilities, can cope with the normal stresses of life, 
can work productively and fruitfully, and is able to make a contribution to his or her 
community” (OECD, 2021[3]; OECD, 2015[6]). This definition explicitly states that 
mental health is not the absence of illness and encompasses multiple aspects of 
psychosocial functioning (World Health Organization, 1948[9]).5 


Various theories about what mental health entails have been developed over the past 
decades. These range from those focusing on symptoms of mental illness either being 
present or not (“binary model”), to those conceiving of mental health as a spectrum of 
experience (“single-continuum model”), all the way to viewing mental ill-health and 
positive mental health as related but distinct experiences (“dual-continuum model”) 
(Figure 1.2). Each of these models carries different implications for which mental 
health outcomes need to be tracked in order in order to capture the concept in its 
entirety. 


Figure 1.2. Models of mental health 
Stylised conceptual frameworks of mental health 
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Source: Adapted from lasiello et al. (2020[10]), “Mental Health and/or Mental Illness: A 
Scoping Review of the Evidence and Implications of the Dual-Continua Model of Mental 
Health”, Evidence Base, 10.21307/eb-2020-001.; Keyes, C. (2005[11]), “Mental illness 
and/or mental health? Investigating axioms of the complete state model of health”, 
Journal of Consulting and Clinical Psychology, 73(3): 539. 

The binary model 


Clinical psychology, psychiatry and research more generally have historically focused 
on the reduction of mental illness symptoms, or psychopathology, in order to improve 
mental health. In this “disease-centred” perspective, mental illness (in the form of 
conditions defined by psychiatric classification systems) is the focal concept, and the 
goal of intervention is primarily to help reduce the associated symptomatology, rather 
than to support people into wellness. In this perspective, an individual is capable of 
experiencing one of two alternative states: either being diagnosed as mentally ill or 
being presumed mentally healthy (Routledge et al., 2016[12]; Keyes, 2005[11]; Trent, 
1992[13)). 


Binary categorisations of mental illness can be useful, for instance, when a person is 
trying to access appropriate health care or other support services or for defining 
guidelines and treatment pathways to manage diagnosed conditions. However, 
practitioners and researchers have criticised the reductionist nature of this model, i.e. 
the notion of an arbitrary point where illness transitions to full health and the 
presumed impossibility of “gaining” more mental health once the threshold of no 
diagnosable condition is crossed (Herron and Trent, 2000[14]). 


Mental health as a continuum 


An alternative approach is to characterise mental health as a continuum of 
experience, from severe mental ill-health, on one end of the spectrum, through to 
positive mental health (high levels of emotional and psychological well-being) on the 
other (Patel et al., 2018[15]; Payton, 2009[16]; Greenblat, 2000[17]). This view is 
rooted in a “salutogenic” approach that focuses on factors that support health and 
well-being, beyond the traditional focus on risks, symptoms and problems. It 
acknowledges a wider breadth of people’s experiences (which are different for 
someone who might feel worried or has trouble sleeping compared to a person 
experiencing a full-blown episode of major depression). 


In this model, “everyone has mental health”, and an individual can move up and down 
the spectrum throughout their life (including up to a daily or weekly basis) depending 
on the context they find themselves in, the challenges they face and the internal and 
external resources available to them. Some researchers have used the metaphor of a 
river, rather than a linear continuum, to express this constant process and the fluidity 
of different states between acute mental ill-health and positive mental health 

(Figure 1.3 (Koushede and Donovan, 2022[18]) 


Figure 1.3. The mental health continuum as a river 
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Source: Koushede, V. and R. Donovan (2022[18]), “Applying Salutogenesis in Community- 
Wide Mental Health Promotion”, The Handbook of Salutogenesis. Springer, 
Cham. https://doi.org/10.1007/978-3-030-79515-3 44. 


From a policy perspective, considering the full soectrum between mental ill-health and 
positive mental health carries implications for both targeting and designing 
interventions and can provide new avenues for proactive rather than reactive system 
and service design. The single-continuum model adds value vis-a-vis the binary 
perspective by providing the space for mental health strategies to focus not just on 
“curing” (diagnosed) illness or reducing the associated symptoms, but also on 
preventing people in the middle of the spectrum from doing worse and on promoting 
mentally healthy populations. 


Mental ill-health and positive mental health as a dual continuum 


A third conceptual view, increasingly considered by international players such as the 
World Health Organisation, several public health agencies, national statistical offices 
and other government departments, more clearly differentiates between mental ill- 
health, on one side, and positive mental health, on the other (Statistics Canada, n.d. 
[19]; Australian Early Development Census, 2012[20]; Swiss Health Observatory, n.d. 
[21]; Government of Western Australia Mental Health Commission, 

2021[22]; Queensland Government, 2015[23]) (World Health Organization, 2022[24]). 
This “dual-continuum” model characterises mental ill-health and positive mental 
states as related but distinct experiences (placing them on two different but 
intersecting continua), rather than as extreme ends of a single spectrum.6 


Mental ill-health and positive mental health, or high levels of emotional and 
psychological well-being, are closely interconnected. Gains in good mental health at 


the population level imply declines in average mental disorders over time, while 
experiencing positive mental health decreases the risk of developing a mental 
disorder, can help recovery once it has been developed and is thus considered an 
important resilience factor (Keyes, Dhingra and Simoes, 2010[25]; Robinson, 
2012[26]; Santini et al., 2022[27]). 


Proponents of the dual-continuum model, however, argue that the association 
between ill-health and positive mental health is not linear, as the single-continuum 
model might suggest: the mere absence of clinically significant symptoms of mental 
ill-health, or diagnosed conditions, does not always imply a thriving mental state. 
Conversely, a person could have symptoms of a mental disorder and associated 
distress and disability, but also be satisfied with their life as a whole and achieving 
their potential (Galderisi et al., 2015[28]). This view, which aims to acknowledge the 
full diversity of human experiences, is also often echoed by people with lived 
experience of mental health conditions (New Zealand Initial Mental Health and 
Wellbeing Commission, 2020[29]). 


The majority of research supporting a dual continuum has relied on confirmatory 
factor analysis (CFA) to compare whether survey data best fit a single- or dual- 
continuum model. Keyes (2005[11])measured aspects of emotional, psychological and 
social well-being7 and some common forms of mental illness (presence of a major 
depressive episode, generalised anxiety disorder, panic disorder, or alcohol 
dependence in the past year) in a nationally representative sample of US adults. He 
then used CFA to highlight the existence of two correlated but separate latent factors. 
Additional studies of non-US populations using a variety of measurement tools for 
both positive mental health and mental illness have further supported the notion of 
the dual-continuum model. A recent review identified 83 peer-reviewed empirical 
articles, including cross-sectional, longitudinal and intervention studies, which 
provided support for the superior explanatory power of dual-continuum models of 
mental health over a single-continuum model (lasiello, van Agteren and Cochrane, 
2020[10]; Routledge et al., 2016[12]).8 


The typical visualisation of two completely orthogonal axes in the dual-continuum 
model can, however, be misleading. Several studies classify individuals into separate 
groups around the model’s quadrants, using variations of categories such as 
“complete mental health” (no mental illness, high positive mental health), 
“vulnerable” (low mental illness, low positive mental health), “symptomatic but 
content” (high mental illness, high positive mental health) and “struggling” (high 
mental illness, low positive mental health) (lasiello, van Agteren and Cochrane, 
2020[10]). Distributions within these categories, however, strongly suggest that levels 
of positive mental health and mental ill-health are highly related and that mental 
health conditions bring significant impairments for emotional and psychological well- 
being. For instance, a study of Australian schoolchildren shows that only around 5% of 
children experience either high levels of positive mental health but also mental ill- 
health, or low levels of positive mental health but no mental ill-health (Figure 1.4). 
Similarly, while in a study by Keyes only one in five people who had no diagnosed 
mental health condition in the past year recorded high positive mental health, even 
fewer respondents with a mental disorder were likely to do so (Table 1.1) (Keyes, 
2005[11)]). 


In the same study, experiences of positive mental health also vary strongly according 
to the type of psychological disorder experienced in the past year (and its severity at 
the time of the survey), ranging from only 2% for those with generalised anxiety 
disorder to 8% for those who were alcohol-dependent (Table 1.1). Nevertheless, the 
share of respondents with a high degree of mental ill-health who can attain some 
degree of positive mental health is not insignificant. Lesser-known interlinkages 


between various aspects of emotional and psychological well-being and different, 
even severe, mental health conditions are also possible: some studies suggest that, 
compared to psychologically-healthy adults, people with depression might react to 
negative events with less distress, while people with bipolar disorder experience 
greater positive emotions during mania, people with schizophrenia can construct 
meaning from their hallucinations and delusions, and trauma Survivors can live 
meaningful lives upon coping with their stressful experiences (Goodman, Doorley and 
Kashdan, 2018[30]). 


Figure 1.4. The dual continuum of mental health in Australian children 
Share of children in their first year of formal full-time schooling experiencing 
different degrees of mental ill-health (anxiety, depression, behaviour problems) 
and positive mental health Pee ca functioning), Australia, 2012 
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Note: Data are drawn from the 2012 national Australian Early Development Index, and 
responses about children were provided by their school teachers. Darker shaded fields 
refer the share of children who have either low positive mental health but no mental 
illness or those who experience mental illness but also high positive mental health. The 
original source termed these categories as mental health difficulties (e.g. anxiety 
disorders, depression, behavioural problems) and mental health competency (e.g. healthy 
psychosocial functioning). 
Source: Australian Early Development Census (2012[20]), The mental health of Australian 


children: A dual continuum, https://www.aedc.gov.au/resources/detail/the-mental-health- 


of-australian-children-a-dual-continuum. 


Table 1.1. Different mental health conditions influence the extent to which 
positive mental health is achievable 


Share of adults with a mental health condition in the past 12 months that report low, 
moderate or high positive mental health, United States, 1994-95 


Low positive mental High positive mental 
health Moderate positive health 
mental health 
(languishing) (flourishing) 

Overall sample 16.9% 65.1% 18% 
(n=3032) 
Major depressive 33.9% 60.2% 6.2% 
episode 
(n=422) 
Generalised anxiety 55.1% 41.8% 2% 
disorder 
(n=98) 
Panic disorder 39.2% 58.3% 2.5% 
(n=204) 
Alcohol dependence 24.7% 69.1% 7.7% 
(n=194) 
Comorbidity 43.5% 55.4% 2.1% 
(n=193) 


Note: Data are drawn from the “Midlife in the United States” study. Mental disorders were 
measured by the Composite International Diagnostic Interview Short Form (CIDI-SF) scale. 
Flourishing (languishing) was defined as an individual exhibiting high (low) levels on one of two 
questions about positive affect and high (low) levels on six of 11 questions about positive 
functioning (per Ryff’s scales of psychological well-being and Keyes’ scales of social well-being). All 
other respondents were categorised within moderate positive mental health. Comorbidity refers to 
the experience of more than one mental health condition, regardless of in which combination. 


Source: Keyes, C. (2005[11]), “Mental illness and/or mental health? Investigating axioms of the 
complete state model of health”, Journal of Consulting and Clinical Psychology, 73(3): 539. 


By distinguishing between mental ill-health and positive mental health, the dual- 
continuum model also implicitly suggests that the relative importance of their 
respective drivers differs. This is important for policy and clinical practice, as the same 
strategies for preventing mental illness might not be sufficient for enhancing positive 
mental states, and vice versa. Evidence on this is still emerging. Some population- 
based studies from Denmark and England have suggested that deprivations in 
people’s material conditions and quality of life (such as low income and educational 
attainment, lack of employment and financial insecurity) predict outcomes at the tail- 
end of each continuum (i.e. both mental ill-health and low levels of positive mental 
health). These same socio-economic factors did not play an equally strong role in 


determining high levels of positive mental health (Stewart-Brown et al., 

2015[31]; Nielsen et al., 2016[32]; Santini et al., 2020[33]). However, population- 
based data from Canada and Slovenia suggests that higher financial security and 
household income are indeed associated with increased odds of psychological well- 
being (Varin et al., 2020[34]; Vinko et al., 2022[35]). By contrast, relational factors 
such as greater social connectedness, improved family relations and participation in 
recreational activities have been associated with both reduced risk of mental health 
conditions as well as higher positive mental health in the majority of studies (Van 
Lente et al., 2012[36]; Santini et al., 2020[33]; Santini et al., 2017[37]; Solin et al., 
2019[38]; Thoits, 2011[39]). 


The value-add of the dual-continuum model (over the single continuum) is that it 
more explicitly communicates that both mental ill-health and good mental states 
impact people’s lives. From a measurement perspective, the dual-continuum model 
suggests that collecting data on both mental health and positive functioning in 
population surveys and health assessments would yield a more complete picture of 
mental health. This would also help to identify the factors, and by extension policy 
levers, associated with the dual goals of improving positive mental health and 
reducing mental illness. This report hence considers the two constructs separately 
where possible and defines each in more detail in the following sections. 


Mental ill-health 


The term mental ill-health refers to diagnosable mental and behavioural conditions, as 
well as the transdiagnostic characteristic of psychological distress. 


Mental health conditions 


The terms “conditions” or “disorders” are used in this report to describe symptoms 
reaching the clinical threshold of a diagnosis according to psychiatric classification 
systems such as the World Health Organization International Classification of Disease 
(ICD) or the American Psychiatric Association Diagnostic and Statistical Manual 
(DSM).9 There are more than one hundred separate diagnoses and disorders featured 
in these classification systems, including mild or moderate anxiety and depression, 
drug and alcohol use disorders, and severe disorders such as severe depression, 
bipolar disorders and schizophrenia, each with their own specific symptoms, age of 
onset and trajectory (Box 1.2). The experience of mental health conditions can be 
highly fluid both over the life-course and over much shorter periods of several weeks - 
e.g. an individual experiencing a moderate depressive episode can worsen so that the 
condition becomes severe, just as a severe episode can be stabilised with the 
symptoms lessened or alleviated (OECD, 2021[3]). 


It is estimated that half of the population will experience a mental health condition in 
their lifetime and about one in five people in any given year (OECD, 2019[40]). The 
data currently available from population-based surveys often focus on experiencing 
symptoms of anxiety and depression (See Chapter 2). Pre-COVID-19 point estimates 
from the Institute for Health Metrics and Evaluation (IHME) suggest that the most 
common mental disorder in EU countries is anxiety disorder, with an estimated 

25 million people (or 5.4% of the population) living with this condition in 2016, 
followed by depressive disorders, which affected over 21 million people (or 4.5% of 
the population). An estimated 11 million people across EU countries (2.4%) have drug 
and alcohol use disorders. Severe mental illnesses such as bipolar disorders affected 
almost 5 million people (1% of the population), while schizophrenic disorders affected 
1.5 million people (0.3%) (OECD/European Union, 2018[41]).10 


Box 1.2. Examples of mental health conditions and their symptoms 


According to the DSM, a mental health condition is a syndrome characterised by a 
clinically significant disturbance in an individual's cognition, emotion regulation or 
behaviour that reflects a dysfunction in the psychological, biological or development 
processes underlying mental functioning. Mental disorders are usually associated with 
significant distress or disability in social, occupational or other important 

activities (American Psychiatric Association, 2013[42]). Comorbidity of mental 
disorders and physical illnesses and multiple mental health problems is common. The 
most recent version, DSM-5, was published in 2013 and lists a total of 157 diagnoses 
and close to 300 disorders. Some of the most common clusters of disorders featured 
include: 


Mood/affective disorders 


Mood disorders, or affective disorders, are characterised by a disturbance of the 
general emotional state that interferes with an individual’s ability to function. Various 
forms of mood disorders exist: for instance, a major depressive disorder is 
characterised by persistent periods of low mood, low self-esteem and loss of interest 
in usually pleasurable activities lasting at least two weeks. Physical symptoms such as 
fatigue, headaches or digestive problems are also common. Bipolar disorder is 
characterised by alternating periods of depression and periods of mania 
(pathologically elevated mood, arousal and energy levels). 


Anxiety disorders 


Anxiety disorders are characterised by excessive and uncontrollable feelings of 
anxiety and fear. Specific symptoms depend on the type of anxiety disorder present. 
The most common anxiety disorders are generalised anxiety disorder, panic disorder 
and social anxiety disorders. In addition, various specific phobias (a fear of specific 
objects or situations) exist, like intense fear of heights or of flying. 


Substance use disorders 


Substance use disorder is a condition characterised by an uncontrollable intake of 
substances despite adverse consequences, and it is often accompanied by emotional, 
physical and behavioural problems and an inability to stop consuming despite several 
attempts. For instance, alcohol use disorder is a type of substance abuse disorder and 
includes frequent and heavy alcohol use. 


Adjustment disorders 


An adjustment disorder is characterised by a maladaptive emotional or behavioural 
reaction to a psychosocial stressor. Adjustment disorders occur when individuals have 
significant difficulties to adjust or cope with a stressful life event. For example, post- 
traumatic stress disorder (PTSD) usually develops due to exposure to traumatic life 
events or threatening situations, such as war, sexual assault or child abuse. 
Symptoms can range from sleeping difficulties, difficulty concentrating or irritability to 
hypervigilance and an exaggerated startle response. 


Psychotic disorders 


Psychotic disorders are severe mental health conditions with delusions and 
hallucinations as common symptoms. The most common psychotic disorder is 
schizophrenia, in which people interpret and experience reality abnormally and which 
is characterised by a combination of hallucinations, delusions and extremely 
disordered thinking and behaviour that impairs daily functioning. 


Personality disorders 


Personality disorders are characterised by long-term maladaptive patterns of 
behaviour, cognition and inner experience that differ significantly from the cultural- 
social norm. They are associated with difficulties in cognition, emotiveness, 
interpersonal functioning or impulse control. Three clusters of personality disorders 
exist: odd or eccentric disorders; dramatic, emotional or erratic disorders; and anxious 
or fearful disorders. 


Somatoform and dissociative disorders 


Somatoform disorders are disorders causing physical symptoms that might not be 
traceable to a somatic cause. Dissociative disorders include problems with memory, 
awareness, perception or identity; people experiencing dissociative disorder might 
feel disconnected from their body or develop different identities. 


Eating disorders 


An eating disorder is characterised by abnormal eating behaviours that affect physical 
and/or mental health. Various types of eating disorders exist, the most common being 
bulimia nervosa, anorexia nervosa and binge eating disorder. Eating disorders are 
often comorbid with anxiety disorders, depression and substance abuse. 


Obsessive-compulsive disorder 


Obsessive-compulsive disorder is a mental and behavioural disorder characterised by 
intrusive, reoccurring thoughts or mental images (obsessions) that generate feelings 
of anxiety, disgust or discomfort, which in turn elicit an urge to perform a certain task 
or routine, such as hand washing, counting, cleaning or arranging things, in order to 
relieve this discomfort (compulsions). 


Source: (World Health Organization, 2021[43]; American Psychiatric Association, 
2013[42]). 


Psychological distress 


The term “psychological distress” is used in this report to refer to non-specific 
symptoms of negative affect (such as sadness, anguish, restlessness), sometimes 
combined with somatic symptoms (such as inability to sleep or loss of appetite) that 
do not reach the clinical threshold of a diagnosis within psychiatric classification 
systems. 


There is some debate about whether psychological distress and mental conditions 
form conceptually distinct phenomena. Some researchers have argued that they are 
qualitatively distinct: psychological distress should only be considered as part of a 
pathological psychological process and a marker of a mental health condition if it is 
persistent and in excess of an “expectable response” to adverse events and other 
stressors. However, this is difficult to determine in practice and may depend on an 
individual’s socio-economic and overall life conditions (Horwitz, 2007[44]; Phillips, 
2009[45]; Payton, 2009[16]; Roger T. Mulder, 2008[46]; Wakefield et al., 2007[47]). 
The DSM-5 does not provide any criteria for determining when distress becomes 
clinically significant; an assessment is usually made based on the degree 

of impairment to functioning produced by the distress, rather than its 
“appropriateness”. 


Many of the tools developed to assess psychological distress in individuals, as 
documented in Chapter 2 of this report, are able to reliably distinguish cases of 


serious mental health conditions from non-serious cases. This suggests that mental 
disorder and distress, as a transdiagnostic characteristic of most mental health 
conditions, are indeed closely related (Barlow and Durand, 2009[48]). Moreover, even 
if the experience of psychological distress were to be temporary, it can imply 
significant suffering and hardship of individuals and deserves attention in its own 
right. 


Positive mental health 


Positive mental health covers psychological, emotional, and in some cases also social, 
relational and spiritual well-being (Huppert, 2005[49]; Keyes, 2005[11]; Steger et al., 
2006[50]; Reis and Gable, 2003[51]).11 


The concept of positive mental health is closely related to that of subjective well- 
being, which refers to “good mental states, including all of the various evaluations, 
positive and negative, that people make of their lives, and the affective reactions of 
people to their experiences” (OECD, 2013[52]). In 2013, the OECD 

published Guidelines on Measuring Subjective Well-being that identified three broad 
aspects of subjective well-being (and proposed measures for data collectors): 


° Life evaluation - a reflective self-assessment of a person’s life as a whole, or 
some specific aspect of it (e.g. life satisfaction measures; satisfaction with financial 
situation) 

° Affect - a person’s feelings, emotions or states, typically measured with 
reference to a particular point in time (e.g. measures about experiences of happiness, 
worry, pain, tiredness) 

° Eudaimonia - a sense of meaning and purpose in life, or good psychological 
functioning (e.g. measures of feeling that the things you do in life are worthwhile). 


The strongest overlap between positive mental health and subjective well-being tends 
to be in the area of affect (where common mental health measures emphasise 
persistent experiences of certain affective states, such as worry, pain or tiredness) 
and eudaimonia (where many measures were explicitly developed to capture positive 
mental health). Additional concepts sometimes featured in measures of positive 
mental health, such as autonomy, optimism, resilience or environmental mastery, are 
not explicitly referenced in the OECD definition of subjective well-being provided 
above (Davydov et al., 2010[53]; Snow, 2019[54]; Peterson and Seligman, 

2004[55]; Conversano et al., 2010[56]; Ryff and Keyes, 1995[57]). Although these 
concepts are sometimes included in some (long-form) measures of eudaimonia and 
psychological functioning that are discussed in the aforementioned OECD 

Guidelines (see Annex 1 and Module D), appraisal styles such as optimism and other 
character traits are considered mediating factors that influence a person’s affective 
reactions to life circumstances, rather than final well-being outcomes to strive 

for (OECD, 2013[52]). 


The area of greatest conceptual difference between subjective well-being and positive 
mental health concerns life evaluation measures, which provide a very broad 
assessment of a person’s life in all its dimensions, rather than assessing only their 
mental health. Nevertheless, in practical terms, life evaluation measures are often 
included in research on (positive) mental health, since they are valuable as broad 
outcome measures that reflect a person’s perception of their well-being as a whole. 


Chapter 2 reviews current data collection practice in OECD countries for the three 
aspects of subjective well-being mentioned above, as well as for positive mental 
health summary scales (mostly stemming from positive psychology) that cover 
aspects of emotional, psychological and social well-being. Chapter 3, which discusses 


the statistical quality of mental health tools, focuses only on the latter, since 

the OECD Guidelines have already considered in-depth the issue measuring life 
evaluations, affect and eudaimonia (OECD, 2013[52]). The topic of measuring affect 
and eudaimonia specifically will also continue to be explored in future OECD 
workstreams on subjective well-being. Extremely broad definitions of positive mental 
health that include domains such as physical and sexual health, financial security, or 
academic and occupational performance (which are covered elsewhere in the OECD 
Well-being Framework) are not considered in this publication (Fusar-Poli and Santini, 
2022[58]; Fusar-Poli et al., 2020[59]; Harvard Center for Health and Happiness, n.d. 
[60)]). 


Conclusion 


Measuring mental health is important to fully assess the well-being outcomes that 
matter to people’s lives. The aim of this report is to encourage official data producers 
to collect population-level data on mental health status more frequently and in an 
internationally harmonised manner, in order to understand how all societal groups, 
rather than only those in touch with the health care system, are faring, and to address 
a topic that is increasingly recognised as public policy challenge. 


Mental health is a multifaceted concept that extends beyond a binary distinction 
between mental illness either being present or not. Considering all aspects of mental 
health can provide new avenues for the proactive rather than reactive design of 
mental health systems and services, draw attention to the importance of caring about 
positive mental health in its own right, and open up the space for policy to focus on 
both reducing illness and promoting good mental states. Collecting data on both 
aspects in household, social and health surveys would yield a more complete picture 
of mental health and help to better understand the drivers and policy levers needed 
for improving it. 
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Notes 


< 1. As described later, mental ill-health in this report refers to diagnosable mental 
and behavioural conditions, as well as the transdiagnostic characteristic of general 
psychological distress. This terms mental health condition and mental disorder are 
used interchangeably in this report to refer to clinically significant symptoms of 
mental ill-health. Positive mental health covers psychological, emotional, and in some 


cases also social, relational and spiritual well-being. This report mainly focuses on the 
areas of positive mental health that have a strong overlap with the related concept of 
subjective well-being and that have been covered in-depth in the 2013 OECD 
Guidelines of Subjective Well-being, which define it as “good mental states, including 
all of the various evaluations, positive and negative, that people make of their lives, 
and the affective reactions of people to their experiences” (OECD, 2013[52]). 


« 2. For instance, the share of people at risk of depression in How’s Life? 2020 was 
reported only for European countries covered by the European Health Interview 
Survey (which is conducted only every five to six years), and information on negative 
affect balance (the share of the population reporting more negative than positive 
feelings and states) is currently sourced from the Gallup World Poll. Similarly, several 
of the surveys used to analyse inequalities in mental distress featured in the 2021 A 
New Benchmark for Mental Health Systems and Fitter Minds, Fitter Jobs workstreams 
were conducted before 2015 and use a variety of different (non-harmonised) 
instruments to measure distress. 


« 3. Current well-being is comprised of 11 dimensions: they relate to material 
conditions that shape people’s economic options (income and wealth, housing, work 
and job quality) and quality-of-life factors that encompass how well people are (and 
how well they feel they are), what they know and can do, and how healthy and safe 
their places of living are (health, knowledge and skills, environmental quality, 
Subjective well-being, safety). Quality of life also encompasses how connected and 
engaged people are, and how and with whom they spend their time (work-life 
balance, social connections, civic engagement). Resources for future well-being are 
expressed in terms of a country’s investment in (or depletion of) different types of 
capital resources that last over time but that are also affected by the decisions taken 
(or not taken) today, and these include economic capital (man-made and financial 
assets), natural capital (stocks of natural resources, land cover, species biodiversity, 
as well as ecosystems and their services), human capital (skills and the future health 
of individuals) and social capital (social norms, shared values and institutional 
arrangements that foster cooperation) (OECD, 2020[4]). 


« 4. The indicator dashboard accompanying the OECD Well-being Framework 
differentiates between current well-being and the resources needed to sustain it, 
relying on different indicators for the two domains - hence, only premature mortality 
and obesity prevalence are included in the human capital indicator set (Exton and 
Fleischer, 2023[61]). However, people’s physical and mental health, which are 
covered in other dimensions, influence their opportunities in later life and are 
conceptually within the scope of human capital. 


< 5. His broad view is also mirrored in the WHO’s definition of health more broadly as 
“complete physical, mental and social well-being and not merely the absence of 
disease and infirmity". 


< 6. Various names for dual-continua models have been proposed, including the dual- 
factor model, two-factor, two-continua, the complete state model and complete 
mental health. 


<= 7. Prevalence of mental health, or flourishing, was defined here as both symptoms 
of hedonia and positive functioning, and it is measured by six questions about positive 
affect, Ryff’s scales of psychological well-being and Keyes’ scales of social well- 

being (Keyes, 2005[11]). 


< 8. These studies were performed in clinical and non-clinical populations, over the 
entire life-course, and in Western and non-Western populations, and included studies 
specifically recruiting minority and at-risk groups. 


« 9. Both the Diagnostic and Statistical Manual of Mental Disorders (DSM) of the 
American Psychiatric Association and the WHO International Classification of Diseases 
(ICD) list a set of criteria that are needed for a diagnosis of a specific mental health 
condition to be met (World Health Organization, 2021[43]; American Psychiatric 
Association, 2013[42]). These criteria, which vary depending on the specific disorder, 
specify the nature and number of symptoms and the level of distress or impairment 
required, and are used to exclude cases where symptoms can be directly attributed to 
general medical conditions, such as a physical injury, or an expectable or culturally 
approved response to a common stressor or loss, such as the death of a loved one. 
Mainly used and developed in the United States by American psychiatry experts, the 
DSM is a specified classification system for mental disorders only, while the ICD is an 
overarching joint classification system for both physical and mental disorders. The 
first version of the DSM was published in 1952 and included 106 specific diagnoses. It 
has since been revised several times with the latest version (DSM-5) having been 
published in 2013, listing a total of 157 diagnoses and close to 300 disorders. A text 
revision (DSM-5-TR) was released in March 2022 that includes among other things 
updated diagnostic criteria and diagnostic codes, Prolonged Grief Disorder as new 
mental health condition, and considerations of the impact of racism and 
discrimination on mental disorders (American Psychiatric Association, 2022[64]). The 
ICD has a chapter (chapter F) devoted specifically to psychiatric disorders and is also 
regularly updated, with version 11 published in 2019. Although the two systems 
present minor differences, they are based on similar sets of rules and assumptions. 


= 10. The Institute for Health Metrics and Evaluation’s burden of disease estimates for 
these mental health disorders are based on a wide variety of data sources and a set of 
complex assumptions regarding prevalence of a given disorder or risk factor and the 
relative harm it causes to quality of life and premature mortality. 


<= 11. The way positive mental health is conceived, sometimes with greater focus and 
sometimes more broadly, is apparent in the way different government agencies 
across the OECD have operationalised the concept: the Canadian Positive Mental 
Health Surveillance Indicator Framework defines it as “a state of well-being that 
allows us to feel, think, and act in ways that enhance our ability to enjoy life and deal 
with the challenges we face” (Government of Canada, n.d.[62]), whereas the Finnish 
Institute for Health and Welfare describes it as “various levels of emotional (feelings), 
psychological (positive actions), social (relationships with others and society), physical 
(physical health and fitness) and spiritual (the sense that life has a meaning) 
wellbeing” (Finnish Institute for Health and Welfare, n.d.[63]). 


2. Measuring population mental health: Tools 
and current country practice 


Abstract 


A variety of tools are available for monitoring population mental health, ranging from 
administrative data to different types of survey questions. Although many OECD 
countries began collecting new or additional mental health data during COVID-19, 
official data producers were already active in this space well before the pandemic 
started. However, there is room for improvement by increasing the frequency of 
(survey) data collection, diversifying the types of indicators used to cover the full 
spectrum of mental health, and expanding the international harmonisation of existing 
measures. Here, data collectors could: (1) beyond screening tools focusing on 
symptoms of depression, expand use to those including symptoms of anxiety as 
outcome measures; (2) move towards collecting harmonised information on affective 
and eudaimonic aspects of positive mental health; and (3) explore using single-item 
questions on general mental health status across surveys. 


The frequent collection of population-level data on mental health outcomes is 
important for identifying populations at-risk for mental ill-health, for determining 
which socio-economic and other factors shape (and are shaped by) people’s mental 
health, and for designing effective prevention and promotion strategies. As outlined in 
Chapter 1, mental health is a multifaceted concept and exists beyond a binary 
distinction between the presence or absence of mental illness. Collecting data on both 
mental ill-health and positive mental health in population surveys and mental health 
assessments would yield a more complete picture of people’s overall mental health 
and help to better understand the drivers and policy levers associated with improving 
it. 

However, the current lack of (internationally) standardised data on population mental 
health makes it difficult to assess the efficacy of different policy approaches across 
disparate contexts; standardising outcome measures is the first step in facilitating 
such analysis. This chapter outlines the tools available to data collectors, gives an 
overview of current data collection practices across OECD countries and offers 
suggestions for which outcomes to prioritise in international harmonisation efforts. 


An analysis of responses to a questionnaire sent to official data producers in OECD 
countries in 2022 shows that all member states that answered are already active in 
this space. Prior to the pandemic, almost all OECD members were already collecting 
information on mental health outcomes in both health interviews and general 
household surveys, as well as via administrative data. COVID-19 has sparked 
additional interest in measuring population mental health, with many public agencies 
and statistical offices adding items to both new and existing surveys. 


These existing data collections demonstrate the interest in, and relevance of, 
population mental health outcomes in a national statistics context. Yet there is room 
for improvement in several areas: the frequency of data collection; greater data 
availability across the full soectrum of both negative and positive mental health 
outcomes; and better harmonisation of measures across countries to improve 
international comparability. 


Indeed, prior to the pandemic most mental health data were collected by countries on 
surveys that ran every four to ten years. While many introduced high-frequency 
surveys with mental health modules in the first two years of COVID-19, it is currently 
unclear whether these surveys will continue to be implemented moving forward. 
Further, although all statistical offices collect data on mental ill-health - with a 
particular focus on common mental disorders - general psychological distress and 
depressive symptoms tend to be captured through standardised screening tools, 
whereas measures of experiencing anxiety are less harmonised across countries. Data 
collection efforts for other mental conditions - such as post-traumatic stress disorder 


(PTSD), bipolar disorder, eating disorders, etc. - and for other aspects of mental 
health - such as suicidal ideation and mental health-related stigma - remain very 
uneven across countries. When it comes to positive mental health, cross-country 
comparative data are mainly limited to measures of life evaluation. Other aspects, 
such as affect and eudaimonia, are much less frequently collected as outcome 
measures, and when they are, the tools used are less likely to be standardised across 
countries. 


The results of the OECD questionnaire suggest that existing data collection efforts are 
not capturing the full range of mental health outcomes - missing aspects of both 
mental ill-health as well as positive mental health. In order to capture these outcomes 
and collect frequent information on mental health, data collectors in OECD member 
countries could: (1) beyond screening tools focusing on symptoms of depression, 
expand use to those including symptoms of anxiety as outcome measures; (2) move 
towards collecting harmonised information on affective and eudaimonic aspects of 
positive mental health; and (3) explore using single-item questions on general mental 
health status across surveys. 


Which tools are available for measuring population mental 
health? 


While Chapter 1 focused on relevant types of outcomes (covering both mental ill- 
health as well as positive mental health) for data collectors interested in mental 
health, this chapter focuses mainly on the types of too/s that can be used to measure 
these. 


The broad tool types discussed in this chapter - some of which are sourced from 
administrative data, but the bulk of which come from household surveys - range from 
long survey modules to a battery of question items to single questions. Some tools 
can be used to capture aspects of either mental ill-health or positive mental health, 
while others are used only for specific types of outcomes. Each type of tool has its 
own advantages and disadvantages, requiring data collectors to select among them, 
depending on the needs and constraints of their specific contexts. The different tools 
are described below in order to provide a common understanding of the 
categorisation used in this report. 


The chapter annexes contain in-depth information for readers interested in further 
details. Annex 2.A provides an overview of which specific tools are collected by each 
country, along with sample question framing and answer options. Annex 2.B lists full 
details, including question wording and scoring recommendations, for the most 
commonly used standardised instruments. More detailed reflections on the statistical 
quality of mental health survey measures are addressed in Chapter 3. 


Tools sourced from administrative data 


Administrative data can contain information on the use of mental health 

services, diagnoses of mental disorders in clinical settings, as well as cause of 
death data from suicide and substance abuse (i.e. drug overdoses and alcohol 
abuse). 


While all of these can be considered objective (i.e. not self-reported) and easy-to- 
collect proxies of mental ill-health, measurement challenges remain. For instance, 
measures of service use and medical diagnoses do not capture population outcomes, 
but rather only those who are willing and able to access health care services. Such 
measures can overestimate comparative levels or incidence rates in countries with 
good (and affordable) medical systems, awareness programmes and less stigma, 


where people are more likely to both seek and receive treatment. In addition, 
preventing ill-health necessitates tracking outcomes prior to, and following, 
engagement with the service sector. This report does not consider administrative 
statistics related to health care further, referring readers to (OECD, 2021[1]). 


Data on causes of death due to suicide or substance abuse (which are commonly 
referred to as “deaths of despair” (Case and Deaton, 2017[2])) do capture mental ill- 
health outcomes at the population level. These measures can act as proxies for 
severe mental illness and addiction. While there are social and cultural reasons 
affecting suicidal behaviours - meaning that not all suicides are the direct result of a 
mental ill-health - living with mental health conditions does substantially increase the 
risk of dying by suicide (OECD, 2021[1]). However, the registration of suicide deaths is 
a complex procedure, affected by factors such as how intent is ascertained, who 
completes the death certificate, and prevailing norms and stigma around suicide, all 
potentially affecting the cross-country comparability of mortality records (OECD, 
2021[1]). 


A general limitation for all types of administrative data is that the additional socio- 
demographic data collected alongside are often limited to the age, sex, geographic 
region and potentially the race/ethnicity of the deceased. This constrains the ability to 
delve into the drivers of mental health and to identify relevant socio-economic, 
environmental and relational risk and resilience factors. 


Tools sourced from household surveys 


In contrast to administrative data, population surveys generally contain information 
on respondents’ material conditions (e.g. income, wealth, labour market outcomes, 
housing quality), quality of life (e.g. physical health, educational attainment, 
environmental quality) and relationships (e.g. social connections, trust, safety). 
Population surveys can have a specific content focus, such as a health survey, ora 
more general scope, such as general social surveys. These surveys are conducted at 
the household level, with more in-depth modules on employment, health (including 
mental health), education, etc., administered to selected household members. Having 
a full range of well-being covariates is important to understand how mental health is 
impacted by, and how it in turn influences, other areas of people’s life. Furthermore, 
tracking (and eventually achieving) equity in mental health outcomes requires 
disaggregation by important socio-demographic categories. 


Tools that have been included in household surveys to assess specific mental health 
outcomes range from single-item questions to standardised batteries of items. A brief 
description of each can be found below, with full details in Annex 2.A and Annex 2.B. 


° Questions about previous diagnoses - This refers to single-item questions 
about whether an individual has been diagnosed with a mental health disorder (e.g. 
major depressive disorder, generalised anxiety disorder, or other mental health 
conditions) by a health care worker, either in the past 12 months or over the course of 
his/her lifetime. These questions typically have yes/no answers and are not 
standardised across countries. For full details, see Table 2.6. Examples include: 


re) “Have your mental health problems ever been diagnosed as a mental 
disorder by a professional (psychiatrist, doctor, clinical psychologist)? Yes / No”. 
) “Have you EVER been told by a doctor or other health professional that 


you had ...Any type of depression? Read if necessary: Some common types of 
depression include major depression (or major depressive disorder), bipolar 
depression, dysthymia, post-partum depression, and seasonal affective disorder. Yes / 
No”. 


° Questions about experienced symptoms - This refers to single-item 
questions about symptoms of mental disorders experienced in the past 12 months or 
over the course of an individual’s lifetime, without explicitly referring to a diagnosis by 
a medical professional. These questions typically have yes/no answers and are not 
standardised across countries. For full details, see Table 2.7. Examples include: 


fe) “During the past 12 months, have you had any of the following diseases 
or conditions? Depression (“Yes / No”). 

) “Have you ever suffered from chronic anxiety? ("Yes / No"). 

fe) “Do you have a mood disorder? Yes / No”. 

° Questions about suicidal ideation and suicide attempts - These are 


(usually) single-item questions about a respondent’s experience of suicidal ideation, 
self-harm behaviours or suicide attempts. These questions typically have yes/no 
answers and are not standardised across countries. Recall periods refer to an 
individual’s lifetime, the last 12 months, the past two weeks, or “during COVID”. For 
full details, see Table 2.8. Examples include: 


re) “Have you seriously contemplated suicide since the COVID-19 pandemic 
began? Yes/No”. 
re) “Sometimes people harm themselves on purpose but they do not mean to 


take their life. In the past 12 months, did you ever harm yourself on purpose but not 
mean to take your life? Yes/No”. 


fe) “Have you ever attempted suicide? Yes/No”. 

fe) “Did you stay in a hospital overnight or longer because you tried to kill 
yourself? Yes/No”. 

° Questions about general mental health status - These refer to single-item 


questions on how respondents rate their mental health overall, and thus capture both 
components of ill-health and positive mental health. Questions are not standardised 
across countries and differ in terms of question wording, response options and recall 
period. For full details, see Table 2.9. Examples include: 


) “In general, how is your mental health? Excellent / Very good / Good / 
Fair / Poor”. 

re) “Has your mental health/well-being been affected by the Covid-19 
pandemic during the last 12 months?” 

fe) “On a scale from 1 to 10 can you indicate to what extent you are satisfied 


with your mental health? A score of 1 refers to completely dissatisfied and a 10 to 
completely satisfied”. 

) “Does your mental state interfere with your daily life at work? your family 
life? Yes / No”. 

. Positive mental health indicators - This refers to questions pertaining to the 
various aspects of positive mental health: life evaluation, affect (Summary affect 
scales, and batteries of questions on positive, negative or mixed affect), eudaimonia 
(questions about quality of life, whether life is worthwhile or meaningful), as well as 
standardised positive mental health composite scales (combining different dimensions 
of positive mental health, prioritising positive over negative affect, and sometimes 
adding a social well-being component). In some instances, positive mental health 
indicators are single-item questions that vary across countries and surveys, while in 
others they are standardised batteries of questions. Standardisation across countries 
varies, with only life evaluation questions and positive mental health composite scales 
being consistently phrased. For full details, see Table 2.10. Specific question item 
phrasing and scoring suggestions for standardised composite scales can be found 

in Annex 2.B. 


° Screening tools - These refer to multi-item instruments designed to screen 
respondents for symptoms (rather than for diagnoses) of mental health conditions. 


These tools were initially developed in clinical settings to screen for common mental 
disorders to identify individuals who may be at risk and to flag them for further 
screening and potential diagnosis. They can be interviewer-led or self-administered 
and focus either on general psychological distress or on specific mental health 
conditions such as major depressive disorder, generalised anxiety disorder (and 
sometimes a combination of the two), alcohol use disorder, post-traumatic stress 
disorder, eating disorders and so on. These tools are considered “validated” in that 
they have been psychometrically tested for their validity (against the gold standard of 
structured interviews or diagnoses), sensitivity (the probability of correctly identifying 
a patient with the condition) and reliability (the measures produce consistent results 
when an individual is interviewed under a given set of circumstances) (refer to 
Chapter 3 for an extended discussion of statistical quality). A wide variety of 
screening tools are available, ranging from very short screeners of two items to longer 
instruments covering 20 items or more. The focus of questions varies between 
screening tools: all cover the frequency of experiencing (mostly negative) affect (i.e. 
feeling low, feeling nervous, feeling worthless), with some also including somatic 
symptoms (i.e. changed appetite, trouble sleeping) and/or functional impairment due 
to emotional distress (e.g. disturbance in daily activities, not being able to 
concentrate, not being able to stop worrying). Screening tools also differ in terms of 
reference period for symptoms, ranging from the past week to the past month; 
however, none are able to measure lifetime prevalence. Given these differences 
between screening tools, they are therefore not always directly comparable and 
should not be used interchangeably for international comparisons. Item scores are 
typically summarised in a Summary index, with the final score being used either as a 
continuous measure of mental ill-health or to assess the risk of a common mental 
health conditions using a validated cut-off score. For full details, refer to Table 2.5. 
Exact question item wording and scoring recommendations for the most frequently 
used screening tools can be found in Annex 2.B. 


° Structured interviews - Structured interviews are considered the gold 
standard for measuring mental disorders (often both on a lifetime and 12-month 
basis). They provide a standardised assessment based on the internationally agreed 
definitions and criteria of recognised psychiatric classification systems and have 
strong diagnostic reliability and psychometric properties to determine whether or not 
a respondent has the condition of interest (Mueller and Segal, 2015[3]; Burger and 
Neeleman, 2007[4]).1 They are administered by trained interviewers, with close- 
ended and fully scripted questions and standardised scoring of responses (Ruedgers, 
2001[5]). Structured interviews approximate assessments conducted by mental 
health professionals and in this way can identify populations at risk for mental health 
conditions even if these individuals have not been diagnosed by a health care 
professional. For additional information on the most commonly used structured 
interview, the Composite International Diagnostic Interview (CIDI), 

see Table 2.4 and Annex 2.B. 

° Additional mental-health related topics - This category refers to questions 
on any other relevant topics, including the use of mental health medication and 
services, the mental health of children and young people in the household, loneliness 
and stress, resilience and self-efficacy, attitudes towards mental health including 
stigma and literacy, and questions on unmet needs. For additional information, 

see Table 2.11. 


Trade-offs between tool types 


All tools imply trade-offs in terms of response burden/ease and cost of data collection, 
accuracy and coverage (Figure 2.1, Table 2.1). Response burden is a direct function of 
how much time an individual needs to spend to provide information on their mental 
health status and how much stress is caused by providing this information. Accuracy 


refers to the sensitivity of a tool in correctly identifying a person with a mental health 
condition, whereas coverage entails whether the measure in question is applied to the 
full (adult) population. 


By way of illustration, administrative data have a low response burden: they do not 
require answers from individual respondents and are routinely collected within a 
country’s data infrastructure. Yet statistics on deaths of despair focus only on the 
extreme end of mental ill-health and are further complicated by the fact that not all 
deaths of despair may be the culmination of a mental disorder. Furthermore, unlike 
household surveys, only those who were in contact with the health care system are 
captured by administrative records of diagnoses in a clinical setting.2 


For household surveys, both the response burden and accuracy increase the longer 
and more specific a tool is: whereas single questions about experienced symptoms or 
a person’s general mental health status are short and easy to answer, they do not 
consider the nature or severity of symptoms, or the type of mental health condition, 
and have not been benchmarked against diagnostic criteria. Screening tools have 
been validated against the gold standard of structured interviews and are, depending 
on the specific tool used and the number of items covered, still relatively low cost in 
terms of response burden. However, they do not constitute a diagnosis from a health 
care professional and can only identify people likely at risk of disorders. Screening 
tools are validated against clinical diagnoses, and are thus designed to maximise 
likeness to diagnostic interviews to the extent possible. Still, when calibrating tools 
and cut-off scores, there is a trade-off between sensitivity (correctly identifying the 
presence of a mental health condition) and specificity (correctly noting the absence of 
a mental health condition), and researchers often prioritise the former rather than the 
latter, leading to slight overestimates by design (see Box 3.3 and Section 3.3.1 fora 
more detailed discussion). Finally, the majority of tools included in both household 
surveys and administrative data focus on mental ill-health; the only exceptions are 
household survey questions about general mental health and positive mental health. 


The difference in question framing and item length - between structured interviews, 
screening tools and single-item questions on experienced symptoms or received 
diagnoses - can lead to different estimates of prevalence for the same reported 
outcome measure (Box 2.1). This speaks to the need for the standardisation of tool 
type (and transparency about which tool was used) when comparing outcomes across 
countries, over time and across population groups: i.e. mixing types of tools when 
commenting on outcomes like “share at risk for depression” or “share at risk for 
psychological distress” can lead to different estimates because of measurement 
differences, rather than because of differences in underlying mental health status 
(refer to Chapter 3 for an extended discussion of these themes). 


Figure 2.1. Trade-off between response burden and accuracy for mental 
health measurement tools 
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Source: Adapted from a presentation given by Statistics Canada at the OECD conference 
“Well-being and mental health - towards an integrated policy approach” in December 


2021. 


Table 2.1. Advantages and limitations of different tools to measure mental health 


Tool 


Administrative 

data (deaths of despair from 
suicide, drug overdose, 
alcohol abuse; diagnoses of 
common mental disorders in 
clinical care settings) 


Advantages Limitations 
¢ No response burden for ¢ Captures only those who sought 
individuals treatment, were correctly coded by a 


—— ; health professional and are part of the 
* Possibility to link across other reporting database 


administrative data (e.g. health 
system quality) ¢ “Cause of death” data need to be 

; correctly coded, do not account for 
* Less costly and more readily suicide attempts or substance abuse not 
available than other types of — |eading to death, and only capture the 
data extreme end of mental ill-health 


* Clinical care data can provide . Often difficult, or even impossible, to 
some insight into lifetime and interpret (without supplemental 


Household 
surveys: questions about 
previous diagnoses 


Household 

surveys: questions about 
experienced symptoms of 
mental health conditions 


Household 

surveys: questions on 
general mental health 
status 


Household 
surveys: indicators of 
positive mental health 


specific time period (e.g. past 
12 months) prevalence 
estimates for a range of ill- 
health conditions when other 
data sources are not available 


e Relatively easy to understand 
for respondents 


¢ Minimal response burden 
(usually a single binary 
question) 


¢ Can provide both lifetime and 
specific time period (e.g. past 
12 months) prevalence 
estimates for a range of ill- 
health conditions 


¢ Minimal response burden 
(usually a single binary 
question) 


¢ Can provide both lifetime and 
specific time period (e.g. past 
12 months) prevalence 
estimates for a range of ill- 
health conditions 


¢ Relatively easy to understand 
for respondents 


¢ Minimal response burden 
(usually a single question) 


¢ Captures a respondent’s 
global evaluation of their 
mental state, and hence both 
ill-health and positive aspects 


¢ Relatively easy to understand 
for respondents 


¢ Minimal response burden 
(usually single or limited-item 
questions) 


¢ Focus on psychological and 
emotional well-being or 


information) whether changes in 
diagnostic rates are driven by changes 
in underlying prevalence of mental 
health conditions or by other factors 
such as changes to affordability or 
accessibility of care, changes in help- 
seeking behaviour, etc. 


¢ Limited contextual information on well- 
being covariates 


¢ Captures only those who sought 
treatment and were diagnosed by a 
health professional 


e Evidence that these questions lead to 
social desirability bias and higher rates 
of refusal and non-response (see 
Chapter 3) 


¢ Limited contextual information on the 
nature and severity of symptoms 


¢ Potential for confusion for respondents 
in terms of whether the question refers 
to an actual diagnosis or their self- 
assessment, though evidence suggests 
this type of tool is closely related to 
questions about previous diagnoses by 
health professionals 


¢ Limited contextual information on the 
nature and severity of symptoms 


¢ Over-reporting of true prevalence - not 
a complete assessment or an actual 
diagnosis, does not consider symptoms 


¢ Has not been validated against 
structured interviews or other diagnostic 
tools, no established threshold in the 
tools as to what constitutes at-risk 
respondents 


¢ Generally less of an existing evidence 
base, though available studies suggest 
this to be a useful measure 


¢ Limited contextual information on the 
nature and severity of symptoms or the 
type of mental health condition 


¢ No reference point of what (true and/or 
desired) prevalence should be 


¢ Recall period for questions typically 
ranges from day prior to past 4 weeks; 
cannot provide lifetime estimates 


flourishing 


¢ International measurement 
guidance exists (e.g. OFCD 
Guidelines on Measuring 
Subjective Well-being) 


Household e Easy to administer and ¢ Over-reporting of true prevalence - not 
surveys: screening tools reduced response burden a complete assessment or an actual 
compared to structured diagnosis 
interviews 


¢ Recall period for questions typically 
¢ Have been validated against ranges from day prior to past 4 weeks; 
structured interviews or other cannot provide lifetime estimates 
diagnostic tools 


¢ Can capture undiagnosed 


conditions 
Household ¢ Approximates true ¢ Very complex to develop and 
surveys: structured prevalence - near gold administer, including interviewer 
interviews standard training 

¢ Can capture undiagnosed ¢ Many questions for people who have 

conditions symptoms 

e Extensive contextual ¢ Lack of survey measurement tools 


information of the respondents’ available to map to most up-to-date 
lives can be taken into account diagnostic guidelines (DSM-5) 


Source: Adapted from a presentation given by Statistics Canada at the OECD conference “Well- 
being and mental health - towards an integrated policy approach” in December 2021. 

Box 2.1. Prevalence rates vary depending on the measurement tool 
used 


Prevalence rates for specific mental health conditions will vary - at times substantially 
- depending on the type of tool used to create the estimate. Screening tools are likely 
to overstate population level prevalence of mental disorders by design. They were 
developed in clinical settings to identify individuals at risk for common mental 
disorders, who can then be flagged for further observation and actual diagnoses - 
some of whom may not end up being diagnosed or needing further 

treatment (National Academies of Sciences Engineering and Medicine, 2021[6]; Topp 
et al., 2015[7]). In contrast, questions that require individuals to report whether they 
have been diagnosed with a mental disorder by a health care professional in the past, 
or currently live with a specific disorder, focus on those in touch with the health care 
system and are therefore likely to understate population prevalence. 


On the first point, Figure 2.2 below shows that screening tools may overestimate 
population prevalence as compared to structured interviews. The figure shows 
national estimates of the same outcome measure - prevalence for major depressive 
disorder (MDE) - in three OECD countries as measured by CIDI, a structured interview, 
and by the Patient Health Questionnaire (PHQ), a screening tool. (The version of the 
PHQ varies by country: PHQ-9 in Canada and Korea, PHQ-2 in the United States. Refer 
to Annex 2.B for the specific items included in each iteration.) While both the CIDI and 
screening tools are used in different surveys within each country, implying that care 
should be taken in making direct comparisons, generally prevalence of MDE as 
measured by the CIDI is lower than that measured through screening tools. The 
exception is the United States, which also shows the smallest difference between the 


estimates. This may be in part because many mental health survey tools were first 
developed, and subsequently extensively validated, in the United States, making the 
calibrations between different tools more precise. 


Figure 2.2. Screening tools typically show greater prevalence of major 
depressive disorder than do structured interviews 
Prevalence of major depressive episodes (MDE), over the past 12 months vs. 
past 2 weeks, as estimated by CIDI and screening tools (PHQ) 
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Note: For all three countries, the structured interview used is the CIDI, which is used to 
measure the prevalence of Major Depressive Episodes (MDE) over the past 12 months. In 
Korea, these estimates are adjusted for age and sex. In Canada and the United States, 
these estimates are nationally representative for the 15+ and 18+ population, 
respectively. The validated screening tool used by Canada is the PHQ-9 (MDE defined as 
having a score >= 10); the PHQ-9 is used by Korea (being at risk for depression is defined 
as having a score >= 10; although not described by KOSIS, Korea’s statistical service, as 
a risk for MDE, this same scoring convention is used by Canada to measure MDE); and the 
PHQ-2 is used by the United States (symptoms of a depressive disorder are defined as 
having a score >= 3). The PHQ-9 and PHQ-2 both have a reference period of the past 2 
weeks. For the United States, the PHQ-2 measures the share with symptoms of a 
depressive disorder, rather than experience of MDE. Refer to Annex 2.B for more 
information on individual screening tools. 
Source: Structured interview data for Canada come from Statistics 
Canada (2013[8]), Canadian Community Health Survey: Mental Health, 2012, The 
Daily, https://www150.statcan.gc.ca/n1/daily-quotidien/130918/dg130918a-eng.htm; PHQ- 
9 data for Canada are derived from Dobson, K. et al. (2020[9]), “Trends in the prevalence 
of depression and anxiety disorders among Canadian working-age adults between 2000 
and 2016”, Health Reports, Vol. 31/12, pp. 12-23, https://doi.org/10.25318/82-003- 
X202001200002-ENG; Structured interview data for Korea come from KOSIS (n.d. 
[10]), Annual prevalence of mental disorders (adjusted for sex and age) (database), 
Korean Statistical Information Service, https://kosis.kr/statHtml/statHtml.do? 
orgid=117&tblld=TX_117_ 2009 HBO27&conn_path=12; PHQ-9 data for Korea come from 
KOSIS (KOSIS, n.d.[11]), Depressive disorder prevalence (database), National Health and 
Nutrition Survey, Korean Statistical Information 


Services, https://knhanes.kdca.go.kr/knhanes/sub01/sub01_05.do#none; Structured 
interview data for the United States come from SAMHSA (2019[12]), Key Substance Use 


and Mental Health Indicators in the United States: Results from the 2018 National Survey 
on Drug Use and Health (database), Substance Abuse and Mental Health Services 
Administration, Rockville, MD, https://www.samhsa.gov/data/sites/default/files/cbhsq- 
reports/NSDUHNationalFindingsReport2018/NSDUHNationalFindingsReport2018.pdf; PHQ- 


2 data for the United States come from the National Center for Health 

Statistics (2021[13]), Estimates of Mental Health Symptomatology, by Month of Interview: 
United States, 2019 (database), U.S. Department of Health and Human Services, Centers 
for Disease Control and Prevention, https://www.cdc.gov/nchs/data/nhis/mental-health- 
monthly-508.pdf. 

StatLink https://stat.link/qSbvk9 


On the second point, Figure 2.3 shows that, based on answers to screening tools, the 
share of the population reporting ever having received a diagnosis for a given mental 
disorder is much lower than the share deemed to be at risk for poor mental health 
conditions; this is often a function of affordability and access to health care, along 
with stigma and mental health illiteracy affecting health-seeking behaviours. The left- 
hand side of the figure displays the share of respondents who are at risk for 
psychological distress or low levels of positive mental health, including: (1) those at 
risk for depression, as defined by a scoring convention of the Short Form-12 mental 
health summary component (SF-12); those at risk for a probable common mental 
disorder, as measured by the General Health Questionnaire-12 (GHQ-12); and (3) 
those who have poor mental well-being, as defined by a scoring convention of the 
Short Warwick-Edinburgh Mental Wellbeing Scale (SWEMWBS). (Refer 

to Table 2.5 and Table 2.10, along with Annex 2.B, for more information on the three 
tools.) The right-hand side of the figure shows the share of respondents who report 
having ever received a diagnosis for a range of specific mental health conditions. 


Figure 2.3. The share of those reporting a diagnosis of a mental health 
condition is much lower than the share identified as experiencing 
psychological distress by screening tools 
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Note: Scoring information for each of the screening tools included: risk for depression is 
defined as having a score <= 45 on the transformed SF-12 mental health component 


composite scale, where 0 indicates worst mental health and 100 best possible mental 
health; risk for a probable common mental disorder (CMD) is defined as having a score 
>= 4 on the GHQ-12, as used in (Woodhead et al., 2012[14]); poor mental health is 
defined as having a SWEMWBS score more than one standard deviation below the sample 
average. Refer to Annex 2.B for more information on individual screening tools. 

Source: OECD calculations based on University of Essex, Institute for Social and Economic 
Research (2022[15]), Understanding Society: Waves 1-11, 2009-2020 and Harmonised 
BHPS: Waves 1-18, 1991-2009 (database), 15"Edition, UK Data Service, SN: 

6614, http://doi.org/10.5255/UKDA-SN-6614-16, from wave 10 only (Jan 2018 - May 
2020). 


StatLink https://stat.link/9lqxu4 


Which population mental health data are OECD countries 
already collecting? 


In February and March of 2022, 37 of 38 OECD countries provided answers to a 
questionnaire designed by the OECD Secretariat to better understand what OECD 
countries are doing in terms of measuring mental health outcomes.3 The 
questionnaire covers the statistical tools used (questions about diagnoses, 
experienced symptoms, screening tools and structured interviews) and outcomes 
covered (mental ill-health, positive mental health and other related topics, including 
loneliness, stress, attitudes towards mental health, etc.). A discussion of mental 
health data related to service use and access to care is set out in A New Benchmark 
for Mental Health Systems (OECD, 2021[1]), and this new round of surveying seeks to 
build upon existing work by primarily focusing on mental health outcomes, rather than 
on service use or access to care, and in particular on outcomes that could be 
measured through household surveys rather than administrative data. 


All OECD countries already collect both administrative and survey data on 
population mental health 


All OECD countries collect mortality statistics on causes of death, including from 
suicides rates as well as deaths from alcohol and drug overdoses. Statistics on causes 
of deaths are typically collected by hospitals or health care providers, while police 
authorities report deaths from suicides. The OECD already regularly publishes 
statistics for its member countries on both deaths from suicide and other types of 
deaths of despair (OECD, 2020[16]; OECD, 2021[17]).4 


Administrative data on mental health go beyond death records. Hospital discharge 
registries that, depending on the country, may cover the length of hospitalisation and 
discharges by field of medical specialisation were mentioned by a number of 
countries, including Canada, Chile, Hungary, Italy, Slovenia, Switzerland and Turkiye. 
Some countries, including Spain and the United Kingdom, collect care or clinical care 
data to measure prevalence and incidence of specific behavioural disorders. The 
Swedish Social Insurance Agency also collects data on causes of work absences, with 
a special category for sick leave following a psychiatric diagnosis. Finally, a handful of 
countries collect administrative data on psychiatric medication. For example, in 
France the Agence nationale de sécurité du médicament (ANSM) publishes data on 
psychotropic drugs delivered to outpatients; Statistics Netherlands provides data on 
dispensed medicines, including those related to mental health conditions as 
determined by ATC (anatomical therapeutic chemical) coding; Australia collects 
administrative data on dispensed medications covered under the Pharmaceutical 
Benefits Scheme; and the Slovenian National Institute of Public Health (NIJZ) hosts 
data on prescription drug claims, including for mental health-related drugs. 


In addition, all OECD countries that responded to the questionnaire reported collecting 
population-wide data on mental health outcomes through household surveys, already 
prior to the COVID-19 pandemic. While much of these data are collected through 
health interviews, 89% of countries reported also collecting mental health data in 
general social surveys (Figure 2.4). Some data on mental health are also collected 
through labour force surveys and special modules of the national census. Some 
countries also reported collecting mental health data in special surveys that focus on 
sub-populations, including Indigenous peoples, those in the criminal justice system 
and young people (see Box 2.2 for more information on the latter). 


Figure 2.4. The majority of OECD countries report measuring population 
mental health in both health and general social surveys 
Share of OECD countries that responded to a survey about population mental 
health 
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Note: Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. 
Source: Responses to an OECD questionnaire sent to national statistical offices in January 
2022, 
StatLink https://stat.link/40bmf6 
Many countries have launched surveys with mental health content since 
the onset of COVID-19, but it is unclear whether these will continue in the 


future 


The pandemic has put mental health high on the national agenda for many OECD 
countries. As a result, most countries that answered the OECD questionnaire reported 
having ramped up data collection efforts on mental health in the months and years 
since March 2020. Around 68% of OECD countries reported collecting additional 
mental health data during the pandemic, either through new stand-alone surveys 
(43%) or by adding mental health and COVID-19 modules to existing surveys (35%) 
(see Table 2.3).5 Many of these new surveys are high-frequency, interviewing 
respondents weekly, biweekly, monthly or quarterly. However, it is unclear whether 
these surveys will continue in the future, or continue with the same frequency. 


Indeed, some COVID-specific surveys have already been discontinued by countries, 
while others that started off as weekly or monthly have since become less frequent 
(biweekly or quarterly). 


Before 2020, only 22% of countries collected mental health data on surveys that ran 
annually or more frequently, and 11% on surveys that ran every two to three years. 
Returning to business as usual prior to the pandemic would mean that over half (51%) 
of countries collect mental health data every four to ten years. Such large gaps 
between survey rounds make it more difficult to track changes at the population-level 
(which as has been seen during the COVID-19 pandemic were sensitive to periods of 
intensifying COVID-19 deaths and strict confinement measures) and craft policy 
interventions accordingly. 


Figure 2.5. Many OECD countries collect mental health data 
infrequently, with over half reporting four-to-ten-year lags between 
survey rounds 
Share of OECD countries that responded to a survey about population mental 
health 
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Note: This figure considers only the most frequently run survey per country, rather than 

the full set of surveys containing mental health data that countries report. It thus shows 

the highest degree of frequency for which mental health are available, per country. 

Results are shown for all OECD countries except Estonia, which did not participate in the 

questionnaire. 

Source: Responses to an OECD questionnaire sent to national statistical offices in January 

2022. 

StatLink https://stat.link/mboi94 

Box 2.2. Initiatives to collect data on mental health for children and 

youth 


The mental health of young people suffered dramatically during the COVID-19 
pandemic (OECD, 2021[18]; OECD, 2021[19]), and a number of OECD countries 
launched campaigns focusing on youth mental health in 2021 and 2022 to help 


combat increasing rates of suicide, reported anxiety, depression and general 
psychological stress (HHS, 2021[20]; Chile, 2021[21]; Santé Publique France, 
2021[22]). The results from the OECD questionnaire show that, although the 
pandemic may have underscored the importance of focusing on young people, many 
OECD countries were already implementing child or youth-specific surveys with 
mental health modules (Table 2.2). 


The measurement of child and youth mental health differs from that of adults in 
several ways. Some surveys use the same tools for children and adults - questions 
about previous diagnoses, standardised composite scales such as the WHO-5, 
negative affect questions - however, there are also some youth-specific validated 
screening tools. Anumber of countries answering the OECD questionnaire reported 
using the Strengths and Difficulties Questionnaire (SDQ), a behavioural screening tool 
for children and youth aged three to 16, or the Development and Well-Being 
Assessment (DAWMA), to screen for psychiatric diagnoses for children starting at age 
of two. Child and youth surveys often include modules covering behavioural and 
emotional issues, adverse childhood experiences, positive childhood experiences and 
substance use/abuse, and can contain questions that are posed to children, parents or 
teachers (Table 2.11). Some surveys also cover previous diagnoses of attention deficit 
hyperactivity disorder (ADHD) or autism spectrum disorder (ASD). 


Table 2.2. Many countries have introduced child and youth surveys, or survey 
modules, with a mental health focus 


Country Survey 


Australia Australian Child and Adolescent Survey of Mental 
Health and Wellbeing 


Canada Canadian Health Survey of Children and Youth 
(CHSCY) 
Germany Study on the Health of Children and Adolescents in 


Germany (KiGGS) 


Italy Quality of Life in Children and Adolescents* 
Luxembourg Youth Survey Luxembourg 

United Kingdom Mental Health of Children and Young People Surveys 
United States Youth Risk Behavior Survey (YRBS) 


National Health Interview Survey (NHIS)+ 


Denmark, Ireland, Italy, Latvia, Luxembourg, Health Behaviour in School-Aged Children (HBSC) 
Slovenia, Sweden 


Note: The HBSC is a school-based survey, not a household survey. * indicates the survey was 
introduced following the start of the pandemic (post-March 2020). t The NHIS includes the Strengths 
and Difficulties Questionnaire (SDQ) in the child component of the rotating core module. Results are 
shown for all OECD countries except Estonia, which did not participate in the questionnaire. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.3. Over half of OECD countries reported increasing the collection of 
mental health data during the COVID-19 pandemic 


Country Stand-alone COVID COVID module added to Any COVID-related 


survey existing survey survey 
Australia e © 
Austria 
Belgium e oO 
Canada e e 
Chile e e 
Colombia e e 
Costa Rica ® e 
Czech Republic 
Denmark 
Finland e © 
France e C) ° 
Germany e fe) e 
Greece 
Hungary 
Iceland e ® 
Ireland e e 
Israel e ® 
Italy fe) e 
Japan 
Korea e @ 


Latvia 


Lithuania 


Luxembourg e e 
Mexico e © 
Netherlands e sd 
New Zealand e ® 
Norway @ e 
Poland 

Portugal 


Slovak Republic 


Slovenia e ® 
Spain e e 
Sweden ce) ® 6 
Switzerland e e 
Turkiye 

United Kingdom e ° 
United States e e ® 


Note: Results are shown for all OECD countries except Estonia, which did not participate in the 

questionnaire. 

Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 
The focus of household surveys is mainly on mental ill-health 


All OECD countries collect data on both mental ill-health and positive mental health 
outcomes. For the former, there is much variety in terms of both the tools used and 
outcomes measured, whereas for the latter cross-country comparative data are 
mainly limited to measures of life evaluation (Figure 2.6); 59% of countries reported 
collecting data on affect, and only 24% on eudaimonia. 


Figure 2.6. All OECD countries reported collecting data on mental ill- 
health and positive mental health, with the latter mostly focused on 
life evaluation 
Share of OECD countries that responded to a survey about population mental 
health which report collecting data on various population mental health 
outcomes 
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Note: Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. Note that the question collected during the EU-SILC 2013 ad hoc 
well-being module, on the extent to which respondents feel that their life is worthwhile, 
was not included in this figure given that the question was removed from subsequent 
well-being modules. 

Source: Responses to an OECD questionnaire sent to national statistical offices in January 
2022. 


StatLink https://stat.link/cghuny 


Mental ill-health outcome measures are captured through a variety of tools. The two 
tools most often reported by countries are screening tools and questions about 
experienced symptoms or disorders (either general or specific), with 97% and 78% of 
countries reporting using these types of tools in household surveys, respectively 
(Figure 2.7). Over half of countries (62%) ask single questions about people’s general 
mental health status. Many fewer countries report collecting data on previous 
diagnoses in household surveys (30%) or in structured interviews (16%). 


Figure 2.7. Screening tools and questions about experience of 
symptoms and disorders are the most common mental ill-health tools 
reported by countries 
Share of OECD countries that responded to a survey about population mental 
health that measure mental ill-health by each type of tool 
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Note: Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. 
Source: Responses to an OECD questionnaire sent to national statistical offices in January 
2022. 
StatLink https://stat.link/5twjx3 
General psychological distress and symptoms of depression tend to be 
captured by standardised screening tools, whereas measures of 
experiencing anxiety are often not harmonised across countries 


Within the continuum of mental ill-health, existing measurement initiatives focus 
more on some forms of mental health issues than on others. Anxiety and depressive 
disorder are the most common mental health conditions affecting people in OECD 
countries (OECD/European Union, 2018[23]).While 86% of countries (32 out of 37) 
have a dedicated validated screening tool for measuring symptoms of depression, and 
95% have one for general psychological distress (35 out of 37), only 41% rely ona 
screening tool for symptoms of anxiety (15 out of 37) (Figure 2.8). Screening tools 
used by countries vary widely in terms of item length, ranging from two to 40 
questions (see Table 2.5). 


Variants of the PHQ are the most common screening tool for measuring symptoms of 
depression, used by 84% (31 out of 37) of countries. The MHI-5 is the most common 
screening tool for general psychological distress, used by 76% of countries (28 out of 
37). In both instances, this is largely driven by Eurostat, which harmonises the data 
collection efforts of European Union member countries: 26 of the 28 countries that 
rely on the MHI-5 participate in Eurostat, all but Australia and Israel.6 The PHQ-8 has 
been included in Eurostat’s European Health Interview Survey (EHIS), which is 
conducted every five to six years. Variants of the PHQ are also used by a number of 
non-European OECD countries (see Table 2.5). 


Figure 2.8. Screening tools capturing general psychological distress and 
symptoms of depression are more commonly used than those for 
symptoms of anxiety or other disorders 


Share of OECD countries that responded to a survey about population mental 
health and that include measures of risk for mental ill-health in their household 
surveys, only validated screening tools 
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Note: Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. Note that the MHI-5 and PHQ-8 findings are partly driven by 
Eurostat, although a number of other non-European OECD countries also use these, 
especially the PHQ-8. The MHI-5 will not be repeated in future EU-SILC ad hoc well-being 
modules, which will reduce the share of countries regularly collecting it. 

Source: Responses to an OECD questionnaire sent to national statistical offices in January 
2022. 


StatLink https://stat.link/exyhv4 


OECD countries also collected data on symptoms of anxiety, although often through 
country-specific tools rather than validated screening tools (Figure 2.9). 70% of 
countries report capturing anxiety outcomes, through some combination of structured 
interviews, questions about previous diagnoses or about experience of anxiety 
disorders, affect data or validated screening tools. Considering all measurement tools 
included in surveys, more countries indicated using them primarily for measuring 
symptoms of depression. The only exceptions are questions about negative affect, for 
which usage is evenly divided: 30% of countries reported using negative affect to 
measure both anxiety (feeling nervous, anxious) and depression (feeling low, 
downhearted). 


The focus of measurement initiatives on depressive and anxiety disorders reflects the 
fact that they are some of the most prevalent mental health 

conditions (OECD/European Union, 2018[23]), and that they contribute highly to the 
disease burden globally and in OECD countries (Santomauro et al., 2021[24]). Data 
collection efforts for other specific mental conditions - such as PTSD, bipolar disorder, 
eating disorders, etc. - remain very uneven across OECD countries (Figure 2.8). 


Figure 2.9. Countries do capture anxiety data, but often with non- 
standardised measures 


Share of OECD countries that responded to a survey about population mental 
health and that include measures of symptoms of depression or anxiety in their 
household surveys, all tool types 
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Note: Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. 
Source: Responses to an OECD questionnaire sent to national statistical offices in January 
2022. 
StatLink https://stat.link/6plesa 

Most countries collect comparative data on life evaluation, but less so on 


affect and eudaimonia 


Almost all OECD countries collect some data on life evaluation, primarily through a 
question on self-reported life satisfaction. Other aspects of positive mental health - 
affect and eudaimonia - are much less frequently covered by surveys undertaken by 
OECD countries; even when they are, the tools used are less standardised across 
countries (Figure 2.10). Measures of affect are more commonly collected than of 
eudaimonia; 59% of countries collect some form of affect data, through a combination 
of standardised composite scales and non-harmonised questions, while only 24% 
collect data on eudaimonia. In terms of standardised tools for measuring positive 
mental health outcomes, the SF-12 (and the SF-36 sub-component on energy and 
vitality, EVI), WHO-5 and either WEMWBS or its shorter form SWEMWBS are the three 
most common instruments; however, their overall use is still low: 30%, 16% and 19% 
of countries reported using each scale in a household survey, respectively. 


Figure 2.10. Affect data are more commonly collected than eudaimonic 
data, but OECD countries are not aligned in the tools used to collect 
data on positive mental health beyond life satisfaction 
Share of OECD countries that responded to a survey about population mental 
health and that include measures of positive well-being in their household 
Surveys, all tool types by outcome measure 
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Note: Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. Note that the question collected during the EU-SILC 2013 ad hoc 
well-being module, on the extent to which respondents feel that their life is worthwhile, 
was not included in this figure given that the question was removed from subsequent 
well-being modules. 
Source: Responses to an OECD questionnaire sent to national statistical offices in January 
2022. 
StatLink https://stat.link/I8kq9v 
There are interesting recent developments in topics such as data 
collection on mental health awareness 


Overall, data collection efforts on additional mental-health related topics (e.g. use of 
mental health medication and services; mental health of children and young people in 
the household; loneliness and stress; resilience and self-efficacy; attitudes towards 
mental health, including stigma and literacy; and questions on unmet needs) are also 
uneven across countries (See Table 2.11). Many of these issues are not yet well- 
defined conceptually, with few internationally standardised tools available. For 
instance, only 30% of countries reported collecting (very different) indicators covering 
the topics of mental health stigma, discrimination, literacy and knowledge of mental 
health issues and resources.7 However, some countries have recently launched new 
survey efforts - and developed new methods - given increased interest in mental 
health awareness. For instance, in 2021 Sweden’s Public Health Agency conducted an 
online population survey, covering more than 10 000 respondents, on knowledge and 
attitudes about mental illness and suicide (Public Health Agency Sweden, 2022[25]). 
After systematically reviewing more than 400 existing instruments for measuring 
mental health stigma and conducting cognitive testing, the Public Health Agency 
concluded that the overwhelmingly negative tone of existing measures was in itself 
stigmatising and focused mostly on examples of severe mental illness. They hence 
decided to develop their own survey: the final questionnaire included items that were 
designed as semantic differentials (word pairs) that captured both positive and 
negative perceptions of mental illness and focused on all forms of mental illness, 
including more common experiences of depression, anxiety and stress-related 
conditions (Public Health Agency Sweden, 2022[25)]). 


Conclusion and ways forward 


Measuring population mental health outcomes is not a new field for producers of 
official data in OECD countries, and many national statistical offices and health 
agencies were already collecting relevant data well before COVID-19. Nevertheless, it 
is also clear that there is room for improvement moving forward. 


First, some aspects of mental health are measured more frequently than others, and 
there is scope for better cross-country harmonisation. The results of the OECD 
questionnaire to official data producers suggest that existing data collection efforts 
are not capturing the full range of mental health outcomes - missing aspects of both 
mental ill-health as well as positive mental health. While 86% of countries use a 
screening tool for symptoms of depression, and 95% for general psychological 
distress, only 41% use a standardised screening tool for symptoms of anxiety - and 
generalised anxiety disorder, along with mood disorders, is one of the most common 
mental health conditions affecting people in OECD countries. Data collection efforts 
for other specific mental conditions - such as post-traumatic stress disorder, bipolar 
disorder, eating disorders, etc. - remain very uneven across countries. When it comes 
to positive mental health, almost all countries gather some form of life evaluation 
data, but information about affect and eudaimonia is much less frequently collected 
(by 59% and 24% of countries, respectively), and often not in a standardised manner. 
Data producers could hence as a first step expand their use of screening tools to 
those that include symptoms of anxiety, as well as depression, and move towards 
more harmonisation for affective and eudaimonic aspects of positive mental health. 


Second, it will be important to measure mental health outcomes regularly, and to 
keep up some of the momentum provided by the high frequency surveys with mental 
health modules initiated during the first two years of the pandemic. Given the trade- 
offs between response burden and accuracy that data producers face when choosing 
between different tools to measure mental health outcomes, adding a single question 
about people’s general mental health status to frequently conducted population 
surveys could be a way to gather this information regularly and help link data across 
surveys. Over half of countries (62%) already include such single items in surveys, 
though question wording varies widely. Canada has been an early leader in 
developing single-item self-reported mental health (SRMH) indicators, and its question 
formulation has already been adopted by Chile and Germany, which could make it a 
useful model for other countries moving forward. While questions about previous 
diagnoses received by health care professionals are also short, evidence suggests 
that they focus mostly on people who have been in touch with the health system and 
hence are better placed in health surveys only. 


Chapter 3 reviews the available evidence on the statistical quality of these 
recommended tools in further detail and provides suggestions for three concrete 
measures that countries could adapt to maximise international harmonisation and 
minimise response burden. 


Lastly, whichever results are communicated to policy makers or the general public, it 
is essential to be transparent as to which exact aspect of mental health is being 
measured, including which areas a specific tool covers and does not cover (e.g. only 
previous diagnosis? only affect, or also somatic symptoms, and if so, which ones?). 
This information is important to contextualise findings and to provide transparency as 
to any limitations that might impact the interpretation of results. 


Annex 2.A. Mental health survey measures by 
country 


Table 2.4. Overview of structured interviews to monitor mental health conditions 


Focus 


Diagnosis 
of mental 
condition 
according 
to ICD-10 
and DSM- 
IV 


Tool 


Composite 
International 
Diagnostic 
Interview 


Abbreviatio Number of 


n 


CIDI 


items 


More than 
300 
symptom 
questions 
but because 
of skip rules 
not all of 
them are 
asked to 
every 
respondent 


Frame of Time to 
reference complete Already collected by 
75 mins Australia, Canada, 


Chile, Germany, Korea, 
United States 
(depressive symptoms 
only) 


Note: Results are shown for all OECD countries except Estonia, which did not participate in the 
questionnaire. For more details on the tool, see Annex 2.B. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.5. Overview of validated screening tools to monitor both general mental 
ill-health and risk for specific mental health conditions 


Focus 


Covers 


Psychologi Negative 
cal distress and positive Health 


affect 


Psychologi Negative 
cal distress affect, 


functional 
impairment 


Psychologi Negative 
cal distress affect 


Psychologi Negative 
cal distress and positive Health 


affect, 
somatic 
symptoms, 
functional 


Tool 


Mental 


Inventory -5 


Abbreviatio Number 


n 


MHI-5 


Kessler Scale K10 


10 


Kessler Scale K6 


6 


General 


Questionnair 


e 


GHQ-12 


Frame 


of Already collected by 


of items referen country 
ce 
5 Past Australia, Austria, Belgium, 
month Czech Republic, 
Denmark, Finland, France, 
Germany, Greece, 
Hungary, Iceland, 
Ireland, Israel, Italy, 
Latvia, Lithuania, 
Luxembourg, Netherlands, N 
orway, 
Poland, Portugal, Slovak 
Republic, Slovenia, Spain, 
Sweden, Switzerland, Turkiy 
e, United Kingdom 
10 Past 4 Australia, Canada, 
weeks Netherlands, New Zealand 
6 Past 4 Australia, Japan, Sweden, 
weeks United States 
12 Recently Australia, Belgium, Finland, 


Spain, United Kingdom 


Symptoms 
of 
depression 
and 
anxiety 


Symptoms 
of 
depression 
and 
anxiety 


Symptoms 
of 
depression 
and 
anxiety 


Symptoms 
of 
depression 
and 
anxiety 


Symptoms 
of 
depression 
and 
anxiety 
among the 
general 
and 
disabled 
population 


Symptoms 
of 
depression 
and 
anxiety 
among the 
general 
and 
disabled 
population 


Depressive 
symptoms 


impairment 


Negative 
affect, 
anhedonia, 
functional 
impairment 


Negative 
and positive 
affect, 
anhedonia 


Negative 
affect 


Negative 
affect, 
anhedonia, 
somatic 
symptoms, 
functional 
impairment 


Negative 
affect, 
functional 
impairment 


Negative 
affect, 
functional 
impairment 


Negative 
affect, 
anhedonia, 
somatic 
symptoms, 
functional 


Patient 
Health 
Questionnair 
e -4 


PHQ-4 


Hospital HADS 
Anxiety and 
Depression 


Scale 


Hopkins HSCL-5 
Symptom 


Checklist 


Depression, DASS-21 
Anxiety and 


Stress Scale 


WG-SS 
Enhanced 


Washington 
Group on 
Disability 
Statistics 


Short Set on 
Functioning - 
Enhanced 


Washington WG-ES 
Group 

Extended Set 

on 

Functioning 


Patient 
Health 
Questionnair 
e -8 


PHQ-8 


4 (2 Past 2 
depressio weeks 
n, 2 

anxiety) 


14 (7 Past 
depressio week 
apes 


anxiety) 

5 Past 
week 

21 (7 Past 


depressio week 
n, 7 

anxiety, 

7 chronic 

non- 

specific 

stress) 


12 (2 
depressio 
n, 2 
anxiety) 


37 (3 
depressio 
n, 3 
anxiety) 


8 Past 2 
weeks 


General 


General 


Australia, Belgium, Canada, 
Chile, Finland, France, 
Germany, Iceland, Korea, 
Slovenia, Switzerland, United 
Kingdom, United States 


France 


Norway 


Australia, Italy 


Australia, Canada, New 
Zealand, United States 


United States 


Austria, Czech Republic, 
Denmark, Germany, Greece, 
Hungary, Iceland, Ireland, 
Italy, Latvia, Lithuania, 
Luxembourg, 


Depressive 
symptoms 


Depressive 
symptoms 


Depressive 
symptoms 


Symptoms 
depression 
among 
recent 
mothers 


Symptoms 
of anxiety 


Symptoms 
of anxiety 


impairment 
(matched to 
major 
depressive 
disorder per 
DSM-IV and 
DSM-5 
criteria) 


Negative 
affect, 
anhedonia, 
somatic 
symptoms, 
functional 
impairment 
(matched to 
major 
depressive 
disorder per 
DSM-IV and 
DSM-5 
criteria) 


Negative 
affect, 
anhedonia 


Negative 
and positive 
affect, 
anhedonia, 
somatic 
symptoms, 
functional 
impairment, 
interpersona 
| challenges 


Negative 
and positive 
affect, 
anhedonia, 
functional 
impairment 


Negative 
affect, 
somatic 
symptoms, 
functional 
impairment 


Negative 
affect, 
functional 
impairment 


Patient 
Health 
Questionnair 
e -9 


PHQ-9 


Patient 
Health 
Questionnair 
e -2 


PHQ-2 


Center for CES-D 
Epidemiologi 

cal Studies 
Depression 


Scale 


Edinburg EPDS 
Post-natal 
Depression 


Scale 


Generalised GAD-7 
Anxiety 
Disorder-7 


Generalised GAD-2 
Anxiety 
Disorder-2 


9 Past 2 
weeks 

(PHQ-8 + 

question 

on 

suicidal 

ideation) 

2 Past 2 
weeks 

20 Past 
week 

6 Past 
week 

7 Past 2 
weeks 

2 Past 2 
weeks 


Netherlands, Norway, 
Poland, Portugal, Slovak 
Republic, Slovenia, 
Spain, Sweden, Turkiye, 


United Kingdom, United States 


Australia, Belgium, Canada, 
Finland, France, Germany, 
Italy, Korea, Slovenia, 
Switzerland, United States 


Australia, Canada, Chile, 
Finland, Germany, Italy, 
Norway, United States 


Mexico 


Italy 


Australia, Belgium, Canada, 
Finland, France, Germany, 
Iceland, Korea, Slovenia, 
Switzerland, United States 


Australia, Canada, Chile, 
Germany, Mexico, United 
Kingdom, United States 


Symptoms Negative The State STAI 40 (20 State Italy 


of anxiety affect, and Trait state anxiety: 
(including Anxiety anxiety, “in this 
panic-like Scale 20 trait moment 
anxiety), anxiety) ”, trait 
functional anxiety: 
impairment, “general 
subjective ly” 
well-being 

Symptoms Presence Patient PHQ-PD 15 Past4 Germany, Switzerland 

of panic and severity Health weeks 

disorder of anxiety Questionnair 
attacks, e-Panic 
somatic Disorder 
symptoms 

Symptoms Presence PTSD PCL-5 20 Past4 Canada 

of post- and severity Checklist for weeks 

traumatic of PTSD DSM-5 

stress symptoms 


disorder (matched to 
(PTSD) DSM-5 
criteria) 


Symptoms Presence Primary Care PC-PTSD-5 5 Past 4 Switzerland 
of PTSD and severity PTSD Screen weeks 

of PTSD for DSM-5 

symptoms 

(matched to 

DSM-5 

criteria) 


Symptoms Presence Impact of IES-R 22 Past Italy 
of PTSD and severity Event Scale - week 

of PTSD revised 

symptoms 

(matched to 

DSM-IV 

criteria) 


Symptoms Presence Angstbarome Angstbarome 12 Past Switzerland 
of and severity ter ter year 
agoraphobi of anxiety 
a related to 
different 
aspects of 
everyday life 


Symptoms Presence Mini-Social Mini-SPIN 3 Past Finland, Switzerland 
of social and severity Phobia week 
anxiety of symptoms Inventory 
disorder _ of social 
anxiety 
disorder 


Symptoms Presence CAGE CAGE 4 No Belgium 


of 
substance 
abuse or 
addiction 


Symptoms 
of 
substance 
abuse or 
addiction 


Symptoms 
of 
substance 
abuse or 
addiction 


Symptoms 
of 
substance 
abuse or 
addiction 


Symptoms 
of eating 
disorders 


Symptoms 
of eating 
disorders 


Symptoms 
of eating 


and severity Substance 

of symptoms Abuse 

of alcoholism Screening 
Tool 


Presence Alcohol Use AUDIT-C 
and severity Disorders 

of symptoms Identification 

of alcoholism Test-Concise 


Presence Alcohol Use AUDIT 
and severity Disorders 

of symptoms Identification 

of alcoholism Test 


Presence Compulsive CIUS 
and severity Internet Use 
of Internet Scale 
addiction 

and 

compulsive, 

pathological, 

or 

problematic 

online 

behaviours 

(matched to 

DSM-IV 

criteria for 

substance 

addiction 

and 

pathological 

gambling) 


Presence SCOFF SCOFF 
and severity 

of symptoms 

of anorexia 

nervosa and 

bulimia 

nervosa 


Presence Patient PHQ-ED 
and severity Health 

of symptoms Questionnair 

of binge e-Eating 

eating Disorder 

disorder, Module 

bulimia 

nervosa and 

recurrent 

binge eating 


Presence Screening DAWBA 


and severity questions 


10 


14 


specific 
recall 
period 


No 
specific 
recall 
period 


No 
specific 
recall 
period 


No 
specific 
recall 
period 


Past 3 
months 


Past 3 
months 


No 
specific 


Chile, Sweden 


France, Spain 


Switzerland 


Belgium, Finland, Germany 


France 


United Kingdom 


disorders of symptoms from the recall 
in 7-17 of eating Development period 
year-olds disorders and 

Wellbeing 

Assessment 

- Eating 

Disorder 

Module 


Note: Countries in italics are those that have explicitly stated that they no longer collect the 
measure in question. Countries in bold did not report collecting the instrument in their official 
questionnaire submission, however, it was added by the OECD Secretariat based on the country’s 
participation in the European Health Interview Survey (EHIS), which contains the PHQ-8 as a core 
module. The PHQ-4 country practice was added in by the Secretariate for countries collecting both 
the PHQ and GAD (from which the PHQ-4 pulls its indicators), regardless of individual country 
reporting on the PHQ-4 itself. Results are shown for all OECD countries except Estonia, which did 
not participate in the questionnaire. Data for the United Kingdom include only surveys carried out 
by the Office for National Statistics on mental health and do not include the data collected by 
devolved administrations. For details of the tools collected by at least two OECD countries, 

see Annex 2.B. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.6. Overview of questions about previous diagnoses 


Already 
: : Answer’ Frame of 
Category Example question framing options reference collected by 
country 
Received diagnosis of any Have you been told by a doctor or Yes/No _ Lifetime Australia, 
mental health condition nurse that you have any of these Slovenia 


long-term health conditions? List: 
Mental health condition (including 
depression or anxiety) 


(AUS) 


Have your mental health 
problems ever been diagnosed as 
a mental disorder by a 
professional (psychiatrist, doctor, 
clinical psychologist)? 


(SVN) 


Received diagnosis of any Have you ever in your life been Yes/No _ Lifetime, last Australia, Austria, 


mood disorder (including diagnosed by a doctor with any of 12 months, Canada, Chile, 
depression) the following health problems or or during Costa Rica, 
illnesses? In the event that you COVID France, New 
have been diagnosed any of Zealand, 
them, have you received or are Slovenia, Spain, 
you undergoing medical United States 
treatment? Depression or anxiety 
(CHL) 


During your life, has a doctor 
ever told you that you had a 
psychiatric or psychological 
disorder or an addiction? 
Depression or depressive episode 


Received diagnosis of 
anxiety disorder 


Received diagnosis of 


bipolar disorder or mania 


Received diagnosis of 
post-traumatic stress 
disorder (PTSD) 


Received diagnosis of 
obsessive compulsive 
disorder (OCD) 


Received diagnosis of 
schizophrenia or other 
psychotic disorders 


Received diagnosis of 
personality disorder 


(FRA) 


Have you ever been told by a Yes / No 
doctor, nurse or other health 
professional that you have any of 


these conditions? Anxiety 
(AUS) 


Has a health professional ever 
told you that you have...? 


Chronic anxiety 
(CRI) 


During your life, has a doctor 
ever told you that you had a 
psychiatric or psychological 
disorder or an addiction? Anxiety 
disorder (generalised anxiety, 
phobia, obsessive compulsive 
disorder, etc.) 


(FRA) 


Have you ever been told by a Yes / No 
doctor that you have bipolar 
disorder, which is sometimes 


called manic depression? 


(NZL) 


Have you ever been diagnosed Yes/No 


with PTSD? 
(CAN) 


Have your mental health Yes / No 
problems ever been diagnosed as 

a mental disorder by a 

professional (psychiatrist, doctor, 

clinical psychologist)? Obsessive 


compulsive disorder 


(SVN) 


During your life, has a doctor Yes / No 
ever told you that you had a 

psychiatric or psychological 

disorder or an addiction? 

Schizophrenia 


(FRA) 


During your life, has a doctor Yes / No 
ever told you that you had a 

psychiatric or psychological 

disorder or an addiction? 

borderline personality disorder 


Lifetime, Australia, Chile, 

during COVID Costa Rica, 
France, New 
Zealand, 
Slovenia, Spain, 
United States 


Lifetime, Australia, France, 
during COVID New Zealand, 
Slovenia 


Lifetime Australia, Canada 

Lifetime Australia, 
Slovenia 

Lifetime, France, Slovenia 

during COVID 

Lifetime, France 

during COVID 


Received diagnosis of 
agoraphobia or social 
disorder 


Received diagnosis of 
addictive disorder or 
substance abuse 
problems 


Received diagnosis of an 
eating disorder 


Received diagnosis of 
conduct disorder or 
behavioural / emotional 
problems 


Neurodiversity: received 
diagnosis of attention 
deficit hyperactivity 
disorder (ADHD) 


Neurodiversity: received Have you ever been diagnosed Yes/No 


(FRA) 


Were you told by a doctor, nurse Yes/No 
or other health professional that 

you had [...] mental health 

condition? Agoraphobia 


(AUS) 


Have you ever been told by a Yes / No 
doctor, nurse or other health 

professional that you have any of 

these conditions? Harmful use or 
dependence on alcohol or drugs 


(AUS) 


During your life, has a doctor 
ever told you that you had a 
psychiatric or psychological 
disorder or an addiction? 
Addiction or addictive disorder 


(FRA) 


Have your mental health Yes / No 
problems ever been diagnosed as 

a mental disorder by a 

professional (psychiatrist, doctor, 

clinical psychologist)? Eating 

disorder 


(SVN) 


Have you ever been told by a Yes / No 
doctor, nurse or other health 

professional that you have any of 

these conditions? Behavioural or 
emotional problems 


(AUS) 


Have you ever been diagnosed 
with conduct disorders by a 
medical professional? 


(ESP) 


Have [you/name] ever been told Yes/No 
by a doctor or other health 

professional that {you/he/she} 

had attention deficit hyperactivity 
disorder (ADHD) or attention 

deficit disorder (ADD)? 


(USA) 


diagnosis of autism with autism by a-— medical 


spectrum disorder (ASD) 


professional? 


Lifetime 


Lifetime, 
during COVID 


Lifetime, 
during COVID 


Lifetime 


Lifetime 


Australia 


Australia, France 


France, Slovenia 


Australia, Spain 


Germany, United 
States 


Lifetime Spain 


(ESP) 


Received diagnosis of any Do you have any other long-term Yes/No _ Lifetime, last Australia, 


other mental health physical or mental health 12 months, Canada, Chile, 
condition condition that has been or during Costa Rica, 
diagnosed by a health COVID France, Slovenia 
professional? 
(CAN) 


Have your mental health 
problems ever been diagnosed as 
a mental disorder by a 
professional (psychiatrist, doctor, 
clinical psychologist)? 


(SVN) 
Note: Results are shown for all OECD countries except Estonia, which did not participate in the 
questionnaire. Data for the United Kingdom include only surveys carried out by the Office for 


National Statistics on mental health and do not include the data collected by devolved 
administrations. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.7. Overview of questions about experienced symptoms and mental health 
conditions 


Example question Answer’ Frame of 


Category framing options reference Already collected by country 
Self-reported Have you suffered Yes/No Lifetime, Hungary, Israel, Slovenia, Sweden, United 
mental health from psychological last 12 States 
problems stress or an acute months, last 

illness in the last 3 months 
three months? 


(ISR) 


Are you currently 
facing mental health 
problems? 


(SVN) 


Do you have your 
own experience with 
mental illness? 


(SWE) 


Do you think you 
ever had a problem 
with your own 
mental health? 


(USA) 


Self-reported During the past12 Yes/No Lifetime, Australia, Austria, Belgium, Canada, 
mood disorder months, have you last 12 Costa Rica, Czech Republic, 
(depression, had any of the months, Denmark, France, Germany, Greece, 
etc.) or mood following diseases or current Hungary, Iceland, Ireland, Italy, 


disorder 
symptoms 


Self-reported 
anxiety 
disorder, or 
anxiety 
symptoms 


Self-reported 
bipolar 
disorder or 
mania 


conditions? 
Depression 


(European OECD 
countries 
participating in 
EHIS) 


Do you have a mood 
disorder? 


(CAN) 


Next | will ask you 
some questions 
related to different 
chronic diseases or 
health conditions 
that you may 
currently have. 
Chronic diseases are 
those of long 
duration and usually 
evolve slowly. 


Do you have chronic 
depression? 


(CRI) 


Do you have an Yes / No 
anxiety disorder? 


(CAN) 


During the last 12 
months did you 
have or do you have 
any of the chronic 
diseases / diseases 
that 


are listed: Anxiety 
disorders (e.g. panic 
attacks, anxiety) 


(GRC) 


Have you ever 
suffered from 
chronic anxiety? 


(ESP) 


Do you have any of Yes/No 
these conditions? 
Bipolar disorder 


(AUS) 


Do you have a mood 
disorder such as 
depression, bipolar 
disorder, mania or 


Lifetime, 
last 12 


Latvia, Lithuania, 

Luxembourg, Netherlands, Norway, 
Poland, Portugal, Slovak 

Republic, Slovenia, Spain, Sweden, 
Turkiye, United Kingdom, United States 


Australia, Canada, Costa Rica, Greece, 
Hungary, Norway, Slovenia, Spain, 


months, last Sweden 


3 months, 
current 


Lifetime 


Australia, Canada 


Self-reported 
PTSD 


Self-reported 
OCD 


Self-reported 
schizophrenia 
or other 
psychotic 
disorders 


Self-reported 
agoraphobia or 
social disorder 


Self-reported 
addictive 
disorder or 
substance 
abuse 
problems 


dysthymia? 
(CAN) 


Do you currently 
experience 
symptoms of PTSD? 


(CAN) 


Do you have any of 
these conditions? 
Obsessive- 
compulsive disorder 
(OCD) 


(AUS) 


(Apart from any 
conditions you have 
told me about) do 
you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? 
Schizophrenia 


(AUS) 


[Do you have] 
Schizophrenia, 
schizotypal and 
delusional disorders 


(HUN) 


Do you have any of 
these conditions? 
Agoraphobia 


(AUS) 


(Apart from any 
conditions you have 
told me about) do 
you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? 
Dependence on 
alcohol; 
Dependence on 
drugs; Harmful use 
or dependence on 
medicinal, 
prescription drugs 


(AUS) 


Yes /No 


Yes /No 


Yes /No 


Yes /No 


Yes /No 


Lifetime Australia, Canada 


Lifetime Australia 


Lifetime or Australia, Hungary 
last 12 
months 


Lifetime Australia 


Lifetime or Australia, Hungary 
last 12 
months 


Self-reported 


In the past 12 


eating disorder months, how often 


Self-reported 
conduct 
disorder or 
behavioural / 
emotional 
problems 


Self-reported 
ADHD 


Self-reported 
ASD 


Self-reported 
dementia 


Self-reported 
intellectual 


have you done the 
following things? 


a. Been preoccupied 
with a desire to be 
thinner 


b. Vomited to lose 
weight 


c. Changed your 
eating habits in 
order to manage 
your weight 


(CAN) 


Have you suffered 
from conduct 
disorders in the last 
12 months? 


(ESP) 


(Apart from any 
conditions you have 
told me about) do 
you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? 
Attention Deficit 
Hyperactivity 
Disorder (ADHD) 


(AUS) 


Have you suffered 
from autism in the 
last 12 months? 


(ESP) 


(Apart from any 
conditions you have 
told me about) do 
you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? 
Dementia, including 
Alzheimer's Disease 


(AUS) 


(Apart from any 
conditions you have 


Never/A Last 12 

few months 

times / 

Monthly / 

Weekly / 

Daily 

Yes / No Lifetime or 
last 12 
months 

Yes /No Lifetime 

Yes / No Lifetime or 
last 12 
months 

Yes /No Lifetime 

Yes /No Lifetime 


Canada 


Australia, Spain 


Australia 


Australia, Spain 


Australia 


Australia 


impairment told me about) do 
you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? 
Intellectual 
impairment, mental 
retardation 


(AUS) 
Self-reported (Apart from any Yes/No Lifetime Australia 
learning conditions you have 
disorder told me about) do 


you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? 
Learning difficulties, 
including dyslexia 


(AUS) 
Self-reported (Apart from any Yes/No Lifetime or Australia, Costa Rica, Hungary 
other mental conditions you have last 12 
disorder told me about) do months 


you have any other 
mental health, 
behavioural or 
cognitive conditions, 
such as these? Any 
other mental or 
behavioural 
condition 


(AUS) 
Note: Results are shown for all OECD countries except Estonia, which did not participate in the 
questionnaire. Data for the United Kingdom include only surveys carried out by the Office for 


National Statistics on mental health and do not include the data collected by devolved 
administrations. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.8. Overview of questions about suicidal ideation and suicide attempts 


Example question Frame of Already collected by 


Category framing mnswer Options reference country 
Suicidal ideation Final question of the Not at all / Several Last 2 weeks Australia, Belgium, 
days / More than Canada, Finland, 
PHQ-9 half the days / France, Germany, Italy, 
Nearly every day Korea, Slovenia, 
Switzerland, United 
States 
Suicidal ideation Have you seriously Yes / No Lifetime, last Australia, Belgium, 


contemplated suicide since 12 months, Canada, Chile, Finland, 


the COVID-19 pandemic during COVID France, Korea, Mexico, 


began? Slovenia, Sweden, 
Switzerland, United 
(CAN) States 


Have you had this 
experience [seriously 
considering suicide] in the 
last 12 months? 


(CHL) 


In the last 12 months, have 
you thought about 
committing suicide? 


(FRA) 


Have you ever been ina 
situation where you 
seriously considered taking 
your own life? 


(SWE) 
Self-harm Sometimes people harm Yes / No Lifetime, last Australia, Canada, 
behaviours themselves on purpose but 12 months, Finland, Greece, 


they do not mean totake Notatall/ Several jacst 2 weeks Mexico 
their life. In the past 12 days/ More than 

months, did you ever harm half the days/ 

yourself on purpose but not Nearly every day 

mean to take your life? 


(CAN) 


During the past 2 weeks, 
how often did you have 
thoughts of hurting 


yourself? 
(GRC) 

Suicide attempts Did you attempt to commit Yes / No Lifetime, last Australia, Belgium, 
suicide in the last 12 12 months, Canada, Chile, Finland, 
months? during COVID France, Korea, 

Luxembourg, Sweden, 
(BEL) United States 
Have you attempted to 
actually commit suicide 
over the last 12 months? 
(KOR) 
Have you ever attempted 
suicide? 
(LUX) 

Suicide attempt Did you stay ina hospital Yes/No Lifetime or Australia, United States 

led to overnight or longer last 12 

hospitalisation or because you tried to kill months 


required medical yourself? 
care 


Received 
counselling 
following suicidal 


(USA) 


Following your thoughts of Yes/No 
suicide, did you talk to 
anyone? 


Lifetime or France, Switzerland, 
last 12 United States 
months 


thoughts or 
suicide attempt (CHE) 


During the past 12 months, 
did you get medical 
attention from a 


doctor or other health 
professional as a result of 
an attempt to kill yourself? 


(USA) 
Note: Results are shown for all OECD countries except Estonia, which did not participate in the 
questionnaire. Data for the United Kingdom include only surveys carried out by the Office for 


National Statistics on mental health and do not include the data collected by devolved 
administrations. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 
Table 2.9. Overview of questions about general mental health status 


Example question 
framing 


Frame of Already collected by 


Answer options 
P reference country 


Category 


Current or Australia, Canada, 

last 4 weeks Chile, Costa Rica, 
Finland, Germany, 
Iceland, Israel, 
Slovenia, Switzerland, 
United States 


Self-reported 
general mental 
health status 


Excellent / Very 
good / Good / 
Fair / Poor 


(AUS, CAN, CHL, 
How is your mental state, DEU) 


In general, how is your 
mental health? 


(CAN, CHL, DEU) 


usually? 
Very good / good / 
(ISR) not so good / Not 
good at all 
(ISR) 


Self-reported During the past 30 days, 
number of how often was your mental 
mentally healthy health not good? 


d 
—- (USA) 


[Number of days] Last 30 days United States 


Yes / No General United States 


assessment 


Self-reported 
recovery 


At this time do you 
consider yourself to be in 
recovery or recovered from 
your own mental health 
problem? 


(USA) 


Self-reported On a scale from 1 to 10 can O (completely General 
satisfaction with you indicate to what extent dissatisfied) to 10 assessment 
mental health you are Satisfied with your (completely 

status mental health? (NLD) satisfied) 


Netherlands, Norway 


Self-reported 
mental health 
status and 
COVID-19 


Mental health 
interferes with 
daily activities 
(impairment- 
days) 


How Satisfied are you with 
your mental health? 


(NOR) 


Compared to before the 
pandemic started, how 
would you say your mental 
health is now? 


(CAN) 


Has your mental 
health/well-being been 
affected by the COVID-19 
pandemic during 2020 / 
during the last 12 months? 


(DNK, LVA, PRT, SVK, SVN, 
TUR) 


How has your morale been 
affected by the pandemic? 


(CHIE) 


During the periods of 
confinement, have there 
been times when you have 
felt so discouraged that 
nothing could cheer you 
up? 


(FRA) 


Does your mental state 
interfere with your daily life 
at work? With family? 


(ISR) 


Have you felt very sad or 
hopeless for more than two 
weeks over the last 12 
months to a degree that 
you have experienced 
disruptions in your daily 
life? 


(KOR) 


During the past 12 months, 
did you ever feel so sad or 
hopeless almost every day 
for two weeks or more ina 
row that you stopped doing 
some usual activities? 


(USA) 


Much better now / During Canada, Denmark, 

Somewhat better COVID-19 Finland, France, 

now / About the Germany, Israel, Japan, 

same / Somewhat Latvia, Netherlands, 

worse now / Much Portugal, Slovak 

worse now Republic, Slovenia, 
Switzerland, Turkiye 

(CAN) 

1. Yes, has been 

negatively 

affected 


2. Yes, has been 
positively affected 


3. No, has not 
been affected 


(DNK, LVA, PRT, 
SVK, TUR) 


O (much worse) - 
10 (much better) 


(CHE) 

Yes / No 

(FRA) 

Yes / No Varies from Australia, Canada, 
past 12 Hungary, Israel, Korea, 


months to Spain, United States 
past 4 weeks 


Note: Results are shown for all OECD countries except Estonia, which did not participate in the 
questionnaire. Data for the United Kingdom include only surveys carried out by the Office for 
National Statistics on mental health and do not include the data collected by devolved 


administrations. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.10. Overview of indicators of positive mental health 


Component Tool Abbreviatio 
s n 
Positive WHO-5 WHO-5 
affect Wellbeing 
Index 
Positive and Short Form SF-12 
negative Health Status 
affect, 
functional 
impairment 
(Mental 
Health 
Component 
Summary) 
Positive and SF-36 SF-36 
negative 
affect, 
functional 
impairment 
Positive and SF-36 vitality EVI 
negative sub-scale 
affect 
Positive or Non- NA 
negative standardised 
affect affect questions 0 (least 
happy) - 10 
Example (happiest) 
questions: 
(FRA) 


During the day 
yesterday, did 104 large 


ou feel happy? extent / 
y PPY Certain / Not 


so much / Not 


(FRA) 
During this 


period [last 12 


months], to 


at all 
(ISR) 


what extent did Never, 


you experience almost never 


the following 


feelings? Stress 


and anxiety 
(ISR) 

Now, lam 
going to 


mention a 
series of 


sometimes, 


almost 


always or 


always 
(CHL) 


Numb 
er of 
items 


re) 


12 


36 


Frame of 
reference 


Last 2 
weeks 


Last 4 
weeks 


Last 4 
weeks 


Last 4 
weeks 


Varies 
from 
yesterday 
to last 
year 


Already collected by country 


France, Hungary, Italy, Latvia, New 
Zealand, Slovenia 


Chile, Italy, Netherlands, New 
Zealand, Spain, United States 


Australia, Germany 


Australia, Belgium, Italy, Switzerland 


Chile, Costa Rica, Finland, France, 
Ireland, Israel, Italy, Japan, Latvia, 
Netherlands, New Zealand, Norway, 
Slovenia, Sweden, United Kingdom 


emotions or 
feelings. How 
often have you 


felt... during 

the last two 

weeks? 

Angry 

Optimistic 

Worried 

Happy 

Sad 

Calm 

Tired 

Useful 

(CHL) 

Eudaimonia Self-reported NA 1 General Australia, Austria, Belgium, Canada, 

feeling that life assessmen Czech Republic, 

is worthwhile or t Denmark, Finland, France, Germany 

meaningful , Greece, Hungary, Iceland, Ireland, 
Italy, Japan, Latvia, Lithuania, 

Example Luxembourg, Netherlands, New 

questions: Zealand, Norway, Poland, Portugal, 

Do you feel Slovak Republic, Slovenia, Spain, 
Sweden 

that what : ee § 

do fourie Switzerland, Turkiye, United 
Kingdom 


has meaning, 
value? 


Answer ona 
scale of 0 (no 
meaning) to 10 
(full of 
meaning) 


(FRA) 


How would you 
usually 
describe 
yourself? 


Would you say: 


1: Happy and 
interested in 
life 


2: Somewhat 
happy 


3: Somewhat 
unhappy 


4: Unhappy 
with little 


Eudaimonia 


Eudaimonia 


Eudaimonia 


interest in life 


5: So unhappy 
that life is not 
worthwhile 


(CAN) 
Self-reported NA 
quality of life 


Example 
question: 


Would you rate 
your quality of 
life as... ? 


1: Excellent 
2: Very good 
3: Good 

4: Fair 

5: Poor 


(CAN) 


Self-reported 1 Very 


satisfaction satisfied, 2 
with self Satisfied, 3 

Moderately 
Example satisfied, 4 
question: Dissatisfied, 
How satisfied > Very 


are you with... dissatisfied 


... yourself? 


(CRI) 


Self-reported NA 
sense of 

purpose, 
accomplishmen 
tor 

achievement of 
goals 


Example 
questions: 


So far, | have 
achieved the 
goals that are 
important to 
me in life 


(MEX) 


My life has a 
clear sense of 
purpose 


General Canada, Costa Rica, Finland, 
assessmen Switzerland 
t 


General Costa Rica, Finland 
assessmen 
t 


General Mexico, United States 
assessmen 
t 


Eudaimonia 


Life 
evaluation 


Life 
evaluation 


Life 
evaluation 


(USA) 


Most days | feel 
a sense of 
accomplishmen 


t from what | do 


(USA) 


Self-reported 
sense of being 
a beneficial 
participant of 
society 


Example 
question: 


How do you 
feel about 
yourself being 
an important 
and beneficial 
participant of 
the society? 


(HUN) 


Self-reported 
life satisfaction 


Example 
question: 


Overall, how 
satisfied are 
you with life as 
a whole these 
days? Please 
answer ona 
scale from 0 to 
10. 0 means 
“not at all 
satisfied” and 
10 means 
“completely 
satisfied”. 


(European 
OECD countries 
participating in 
EU-SILC well- 
being modules) 


Satisfaction 
with Life Scale 
(SWLS) 


Self-reported 
happiness 


Example 


General Hungary 
assessmen 
t 


General Australia, Austria, Belgium, Canada, 

assessmen Chile, Czech Republic, 

t Denmark, Finland, France, Germany, 
Greece, Hungary, Iceland, Ireland, 
Israel, Italy, Korea, Latvia, Lithuania, 
Luxembourg, Mexico, Netherlands, N 
ew Zealand, Norway, Poland, 
Portugal, Slovak 
Republic, Slovenia, Spain, 
Sweden, Switzerland, Turkiye, United 
Kingdom, United States 


General Norway, Slovenia 
assessmen 
t 


General Chile, France, Iceland, Japan, 
assessmen Netherlands, Switzerland, United 
t States 


Life 
evaluation 


Positive 
affect, 
eudaimonia, 
life 
satisfaction, 
social well- 
being 


Positive 
affect, 
eudaimonia, 
social well- 
being 


Positive 
affect, 
eudaimonia, 
social well- 
being 


Positive and 
negative 
affect, 
eudaimonia, 
self-esteem, 
concentratio 
n 


question: 


Overall, how 
happy do you 
think you are? 
Please check 
one box ona 
scale of 1- 


10 where 1 
means very 
unhappy and 
10 very happy. 


(ISL) 


Self-reported NA 
living 
conditions 


Example 
question: 


Currently the 
living 
conditions in 
your household 
are: 1. Very 
good; 2. Good; 
3. Fair; 4. Bad 


(COL) 


Mental Health MHC-SF 
Continuum 
Short Form 


Warwick- WEMWBS 
Edinburgh 
Mental Well- 


Being Scale 


Short Warwick- SWEMWBS 
Edinburgh 

Mental Well- 

Being Scale 


WHO Quality of WHOQOL- 
Life-BREF BREF 
psychological 

health domain 


14 


14 


General 
assessmen 


t 


Past 
month 


Last 2 
weeks 


Last 2 
weeks 


Last 2 
weeks 


Colombia 


Canada, Slovenia 


Finland, Norway* 


Canada, Finland, Germany, Iceland, 
Sweden, United Kingdom 


Chile 


Note: *Norway does not currently collect WEMWBS but indicated that the tool may be included in 
future rounds of the National Survey on Quality of Life. Countries in italics are those that have 
explicitly stated that they no longer collect the measure in question. Countries in bold did not 
report collecting the instrument in their official questionnaire submission, however, it was added 
by the OECD Secretariat based on the country’s participation in the European Union Statistics on 
Income and Living Conditions (EU-SILC), which contained the question "Overall, to what extent do 
you feel that the things you do in your life are worthwhile?" in the 2013 ad-hoc module focusing on 
well-being; the measure was not included again in 2018. Countries in bold and italics did not report 
collecting the instrument in their official questionnaire, however, it was added by the OECD 
Secretariat based on the country’s participation in a 2016 OECD questionnaire on subjective well- 
being measures. Results are shown for all OECD countries except Estonia, which did not participate 
in the questionnaire. Data for the United Kingdom include only surveys carried out by the Office for 
National Statistics on mental health and do not include the data collected by devolved 
administrations. For details of the tools collected by at least two OECD countries, see Annex 2.B. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Table 2.11. High-level overview of additional mental health-related topics 
collected by countries 


Types of Types of Indicators 


Topic Area tools Used Collected 


Already collected by country 


Access to/ Self-reported Sought care from amental Australia, Canada, Chile, Colombia, Finland, 


use of non- health professional France, Ireland, Japan, Korea, Luxembourg, 
mental standardised (psychologist, psychiatrist, New Zealand, Slovenia, United States 
health questions etc.) 
services 
Medication prescribed or Belgium, Canada, Chile, Finland, France, 
taken (anti-depressants, Germany, Norway, Slovenia, Spain 
anxiolytics) 
Mental Standardised Strengths and Difficulties Australia, Belgium, Finland France, Germany, 
health of screening tools, Questionnaire (SDQ); Italy, Slovenia, Spain, United Kingdom, 
children and diagnoses and KIDSCREEN-27 and United States 
young experienced KIDSCREEN-10; Screen for 
people symptoms Child Anxiety and Related 


Emotional Disorders 
(SCARED); Short Moods and 
Feelings Questionnaire 
(SMFQ) 


Diagnostic and reported Canada, Italy, Spain, Turkiye, United States 
experience of conduct 

disorders, behavioural and 

emotional issues, positive 

and adverse early 

childhood experiences, and 

substance use/abuse 


behaviours 
Loneliness Standardised Loneliness and social Australia, Austria, Belgium, Canada, 
and stress screening tools, connections: UCLA Colombia, Costa Rica, Czech Republic, 
non- Loneliness Scale, Oslo Denmark, Finland, France, 
standardised Social Support Scale; Germany, Greece, Hungary, Iceland, Ireland, 
self-reported Multidimensional Scale of Israel, Italy, Japan, Latvia, Lithuania, 
indicators Perceived Social Support; Luxembourg, Netherlands, New 


non-standardised indicators Zealand, Norway, Poland, 


Resilience, 


optimism 
and self- 
efficacy 


Attitudes 
towards 
mental 
health 


Standardised 
composite 
scales, non- 
standardised 
self-reported 
indicators 


Standardised 
composite 
scales, non- 
standardised 
self-reported 
indicators 


Portugal, Slovak Republic, 
Slovenia, Spain, Sweden, United 
States, United Kingdom 


Canada, Colombia, Iceland, Israel, Italy, 
Korea, Latvia, Slovenia, Sweden 


Stress: Cohen Perceived 
Stress Scale (PSS); non- 
standardised indicators 


Pearlin and Schooler’s Australia, Canada, Germany, Italy, Norway, 
Mastery Scale, General Self- Slovenia, Switzerland 

Efficacy Scale, Brief 

Resilient Coping Scale, 

Short Sense of Coherence 

Questionnaire, Connor- 

Davidson Resilience Scale 

(CD RISC-10), Single Item 

Self-esteem Scale; non- 

standardised indicators 


Non-standardised indicators Costa Rica, Hungary, Italy, Japan, Korea, 
covering topics of stigma, Mexico, New Zealand, Norway, Slovenia, 
discrimination, literacy and Sweden 

knowledge of mental health 

issues and resources 


Mental health literacy: Slovenia 
Depression and Anxiety 
Literacy questionnaire (D- 


Lit; A-Lit) 


Note: Results are shown for all OECD countries except Estonia, which did not participate in the 
questionnaire. Data for the United Kingdom include only surveys carried out by the Office for 
National Statistics on mental health and do not include the data collected by devolved 
administrations. Countries in bold did not report collecting the instrument in their official 
questionnaire submission, however, it was added by the OECD Secretariat based on the country’s 
participation in the European Health Interview Survey (EHIS), which contained the Oslo Social 
Support Scale (OSS-3) in waves 2 and 3. 


Source: Responses to an OECD questionnaire sent to national statistical offices in January 2022. 


Annex 2.B. Details on standardised survey 
tools to measure mental health 


Mental ill-health 


Mental health conditions: Structured interviews 


Composite International Diagnostic Interview (CIDI): The Composite 
International Diagnostic Interview (CIDI) is a comprehensive, fully-structured interview 
designed to be used by trained lay interviewers for the assessment of mental 
disorders according to the definitions and criteria of ICD-10 and DSM-IV (Kessler and 
Bedirhan Ustun, 2006[26]). A computer-assisted version of the interview is available 
along with a direct data entry software system that can be used to keypunch 
responses to the paper-and-pencil version of the interview. The CIDI is intended for 
use in epidemiological and cross-cultural studies as well as for clinical and research 
purposes. It allows investigators to measure the prevalence of lifetime and 12-month 


mental conditions, the severity and courses of these disorders, their impact on home 
management, work life, relationships and social life, and service and medications use. 
Several versions of the CIDI exist, but the latest version is the World Health 
Organization’s Composite International Diagnostic Interview (WHO-CIDI) V3.0 (Harvard 
Medical School, n.d.[27]). In total, the CIDI consists of a screening module and 40 
sections, 22 of which are diagnostic sections to assess mood (two sections), anxiety 
(seven sections), substance abuse (two sections), childhood (four sections) and other 
disorders (seven sections). The remaining sections assess functioning and physical 
comorbidity, risk factors, socio-demographic information and the treatment of mental 
disorders. The screening module, which includes a series of introductory questions 
about the respondent’s general health before delving into the diagnostic stem 
questions, has been shown to increase the accuracy of diagnostic assessments by 
reducing the effects of respondent fatigue and unwillingness to disclose on stem 
question endorsement (Harvard Medical School, n.d.[27]). 


Symptoms of mental ill-health: Screening tools 


The public health tools presented in this section focus mainly on royalty-free 
instruments, since fees and copyright restrictions might present a barrier to use. 


Mental Health Inventory (MHI-5): The Mental Health Inventory-5 (MHI-5) is a five-item 
scale to screen for symptoms of psychological distress. It is drawn from the 38-item 
Mental Health Inventory (MHI) and included in the 20-item and 36-item versions of the 
Short Form Health Survey (SF-20 and SF-36) (Berwick et al., 1991[28]; Kelly et al., 
2008[29]). The questions tap into both negative and positive affect, with three items 
focusing on low/depressed mood and two on nervousness/anxiety (although the tool 
itself is not used to present these aspects separately). The MHI-5 has been found to 
be a reliable measure of mental health status and has been validated against both 
depressive and, to a lesser degree, also anxiety disorders (including generalised 
anxiety and panic disorder) in general population and patient samples in a range of 
countries (Yamazaki, Fukuhara and Green, 2005[30]; Hoeymans et al., 

2004[31]; Elovanio et al., 2020[32]; Gill et al., 2007[33]; Rumpf et al., 

2001[34]; Strand et al., 2003[35]; Thorsen et al., 2013[36]). There is some evidence 
that removing the two anxiety-related items does not reduce the effectiveness of the 
MHI in detecting depression, although this has not been examined in studies in which 
a formal diagnosis according to clinical criteria was used as a gold 

standard (Yamazaki, Fukuhara and Green, 2005[30)]). 


Table 2.12. MHI-5 Questionnaire with scoring breakdown 


prey Most of A good bit Some of A little of None of 
time the time of the time the time the time the time 
During the past month, how 
much of the time: 
1. Have you been a happy person? 1 2 3 4 5 6 
(reverse coded) 
2. Have you felt calm and peaceful? 1 2 3 4 > 6 


(reverse coded) 


3. Have you been a very nervous 1 2 3 4 5 6 
person? 


4. Have you felt downhearted and a 2 3 4 5 6 
blue? 

5. Have you felt so down in the 1 2 3 4 5 6 
dumps that nothing could cheer you 

up? 


Note: All items are added together to provide a total score from 5 to 30, which is then transformed 
into a variable ranging from 0-100 using a standard linear transformation. Higher values indicate 
better mental health, with the following cut-off points for various degrees of psychological distress: 
68 or less mild, moderate or severe, 60 or less moderate or Severe, 52 or less severe. 


Source: Kelly, M.J. et al. (2008[29]), “Evaluating cutpoints for the MHI-5 and MCS using the GHQ- 
12: A comparison of five different methods”, BMC Psychiatry Vol. 


8/10, https://doi.org/10.1186/1471-244xX-8-10. 


The Short-Form Health Survey (SF-12): The Short-Form Health Survey (SF-12) isa 
tool to measure health-related quality of life. |t was developed as a shorter 
alternative to the SF-36 questionnaire to be used in the general population and in 
large surveys and contains up to two items for each of the SF-36’s eight dimensions: 
general mental health, energy and fatigue, bodily pain, general health perceptions, 
limitations on physical activity due to health, limitations on social activity due to 
physical or emotional conditions, limitations on day-to-day activities due to physical 
health, and limitations on day-to-day activities due to emotional health (Ware et al., 
2002[37]). Anumber of questions in both the SF-12 and SF-36 are taken directly from 
the Mental Health Inventory (MHI), which also features the MHI-5 free-standing scale 
in its own right (See above) (RAND, n.d.[38]). Two summary scores, the Physical 
Component Summary (PCS) and the Mental Component Summary (MCS), can be 
derived from the SF-12, and a range of scoring methods have been validated against 
both active and recent depressive disorders and to a lesser degree also anxiety 
disorders in general population samples (Ware et al., 2002[37]; Gill et al., 

2007[33]; Vilagut et al., 2013[39]). Some evidence suggests that the association 
between the SF-12’s physical health dimensions might be more strongly related with 
mental health in low-income settings, with implications for context-specific 

weights (Ohrnberger et al., 2020[40]). The SF-12 is subject to copyright restrictions 
and can thus not be republished in this report (Quality Metric, n.d.[41]). 


Kessler Scale (K10/ K6): The Kessler psychological distress scale, which is most often 
used in its 10-item (K10) and 6-item (K6) form, is a screening tool for identifying 
adults with significant levels of psychological distress. The questions focus on somatic 
symptoms and negative affect, particularly on both low-depressed mood and 
nervousness/anxiety. While these aspects are usually not presented separately anda 
total score for distress is usually used, factor analysis has established depression and 
anxiety as distinct clusters in the K10 (Brooks, Beard and Steel, 2006[42]). Indeed, 
although it is often applied in primary clinical settings as well, it was designed for use 
in the general population, and sensitivity and specificity analysis support both K6 and 
K10 as screening instruments to identify likely community cases of anxiety and 
depression (Slade, Grove and Burgess, 2011[43]). Furthermore, they have been 
extensively validated, including in cross-cultural settings, against diagnostic interview 
evaluations of anxiety and affective disorders, with lesser but significant associations 
with other mental disorder categories and with the presence of any current mental 
disorder (Andrews and Slade, 2001[44]). There is also some evidence that the Kessler 
scales can be used successfully (with lower cut-off scoring criteria) to capture 


individuals struggling with more moderate psychological distress that nonetheless 
warrants mental health intervention (Prochaska et al., 2012[45]). 


Table 2.13. Kessler Scale 10/6 Questionnaire with scoring breakdown 


None of the A little of Some of Most of All of the 


time the time the time the time time 
During the last 30 days, about 
how often did you feel: 
1. Tired out for no good reason? 1 2 3 4 5 
2. Nervous? il 2 3 4 5 
3. So nervous that nothing could calm 1 2 3 4 5 
you down? 
4. Hopeless? 1 2 3 4 5 
5. Restless or fidgety? 1 2 3 4 5 
6. So restless you could not sit still? 1 2 3 4 5 
7. Depressed? 1 2 3 4 5 
8. That everything was an effort? 1 2 3 4 5 
9, So sad that nothing could cheer you 1 2 3 4 5 
up? 
10. Worthless? Hl 2 3 4 5 


Note: All items are added together to provide a total score, where higher values indicate worse 
mental health. However, different scoring methods for both K10 and K6 scales have been used 
depending on the country and institutional context. For instance, in the United States, answers are 
coded from 0-4 (leading to a maximum possible score of 40 for the K10 and 24 for the K6), 
whereas in Australia, 1-5 as shown in the table above have been used (leading to a maximum 
possible score of 50 for the K10 and 30 for the K6). The K10 scoring used in Australian health 
surveys have typically been as follows: 10-5 low, 16-21 moderate, 22-29 high, 30-50 very high 
psychological distress. For the K6 scoring, respondents with scores of 13 (in the 0-4 coding)/ 19 (in 
the 1-5 coding) or higher are typically classified as having a probable serious mental illness. Cut-off 
scores in other contexts might vary. 


Source: ABS (2007[46]), Information Paper: Use of the Kessler Psychological Distress Scale in ABS 
Health Surveys, Australian Bureau of 


Statistics, https://www.abs.gov.au/ausstats/abs@.nsf/lookup/4817.0.55.001chapter92007-08; 


Kessler, R. et al. (2010[47]), “Screening for serious mental illness in the general population with 
the K6 screening scale: Results from the WHO World Mental Health (WMH) survey 
initiative”, /nternational Journal of Methods in Psychiatric Research, Vol. 19/S1, pp. 4- 


22, https://doi.org/10.1002/mpr.310. 


General Health Questionnaire (GHQ-12): The 12-item General Health 
Questionnaire (GHQ-12) is a measure to detect psychological distress by focusing on 
affect (negative and positive), somatic symptoms and the functional impairment of 


respondents. The GHQ-12 has been translated into many languages and extensively 
validated in general and clinical populations worldwide (particularly against 
depression and anxiety disorders), including among adolescent samples (Hankins, 
2008[48]; Gilbody, 2001[49]; Baksheev et al., 2011[50]). Originally intended as a 
unidimensional measure, there is some debate about the dimensionality of the GHQ- 
12, with many factor-analytical studies supporting a range of multidimensional 
structures (e.g. anxiety and depression, social dysfunction, loss of confidence) (Gao 
et al., 2004[51]). However, more recent evidence points to these results likely being 
an expression of method-specific variance caused by item wording, supporting the 
notion that treating the scale as a unitary construct would minimise bias (Hystad and 
Johnsen, 2020[52]). The GHQ-12 is subject to copyright restrictions and can thus not 
be republished in this report. 


Patient Health Questionnaire (PHQ-9/ PHQ-8): The full Patient Health Questionnaire 
(PHQ) contains 59 questions, with modules focusing on mood, anxiety, alcohol, eating 
and somatoform disorders. The PHQ-9 is a nine-question survey designed to detect 
the presence and severity of depressive symptoms, and it directly maps onto the 
DSM-IV and DSM-5 symptom criteria for major depressive disorder. The PHQ-8 
questionnaire removes the final question regarding suicidal ideation. While a one- 
factor structure for both the PHQ-8/9 has been identified, more recent studies support 
a two-factor model composed of affective and somatic factors (Sunderland et al., 
2019[53]). Both instruments have shown acceptable diagnostic screening properties 
across various population and clinical settings, age groups, and cultures/ ethnicities, 
in addition to being also a reliable and valid measure of depression severity (Manea, 
Gilbody and McMillan, 2012[54]; Moriarty et al., 2015[55]; Kroenke et al., 

2009[56]; Huang et al., 2006[57]; Kroenke, Spitzer and Williams, 

2001[58]; Richardson et al., 2010[59]). The close alignment between the PHQ-8/9 and 
the DSM make it subject to the same criticism, including a potentially Western- 
focused construct of depression, relative to longer self-reported scales with less 
constrained symptom sets (Zimmerman et al., 2012[60]; Haroz et al., 2017[61]). 


Table 2.14. PHQ-9/8 questionnaire with scoring breakdown 


More than Nearly 


Not Several half the every 


atall days 


days day 
Over the last two weeks, how often have you been 
bothered by any of the following problems: 
1. Little interest or pleasure in doing things 0 1 2 3 
2. Feeling down, depressed or hopeless 0 1 2 3 
3. Trouble falling or staying asleep, or 0 1 2 
sleeping too much 
4. Feeling tired or having little energy 0 1 2 3 
5. Poor appetite or overeating 0 1 2 3 
6. Feeling bad about yourself - or that you are a failure or have O 1 2 3 


let yourself or your family down 


7. Trouble concentrating on things, such as reading the 0 1 2 3 
newspaper or watching television 


8. Moving or speaking so slowly that other people could have 0 1 2 3 
noticed. Or the opposite - being so fidgety or restless that you 
have been moving around a lot more than usual 


9, Thoughts that you would be better off dead or of hurting 0O 1 2 3 
yourself in some way 


Note: The last item in italics is the question on suicidal ideation that is added for the PHQ-9. 
Scoring can be done in two ways: (1) via an “algorithm diagnosis” of either major depression or 
other depression; or (2) via summing all items and applying different cut-off scores for depression 
severity. In the algorithm diagnosis that adheres to DSM definitions, the first or second item 
(depressed mood or anhedonia) have to present at least “more than half the days” and, combined 
with at least 5 of the total symptoms or 2 to 4 symptoms also present at this frequency, 
constitutes major depression or other depression, respectively. In the second form of 
categorisation, all items are added together to provide a total score of depression severity, with 
scores ranging from 0-24 for the PHQ-8 and 0-27 for the PHQ-9: 0-4 none, 5-9 mild depression, 10- 
14 moderate depression, 15-19 moderately severe depression, 20-24/27 severe depression. A 
score of =10 indeed typically represents clinically significant depression regardless of diagnostic 
status. 


Source: Kroenke, K. et al. (2009[56]), “The PHQ-8 as a measure of current depression in the 
general population”, Journal of Affective Disorders, Vol. 114/1-3, pp. 163- 


173, https://doi.org/10.1016/j.jad.2008.06.026; Kroenke, K. et al. (2001[58]), “The PHQ-9: Validity 
of a brief depression severity measure”, Journal of General Internal Medicine, Vol. 16/9, pp. 606- 
613, http://dx.doi.org/10.1046/j.1525-1497.2001.016009606.x. 


The Generalised Anxiety Disorder Questionnaire (GAD-7/GAD-2): The 
Generalised Anxiety Disorder Questionnaire (GAD-7) comprises seven questions about 
the frequency of broad anxiety-related problems in the past two weeks. It was 
developed for screening and severity assessment of Generalised Anxiety Disorder, 
and the items cover most but not all (symptoms of this disorder listed in the DSM-IV 
and 5 (excessive worry, difficulty to control the worry, restlessness and irritability but 
not e.g. fatigue, muscle tension, sleep disturbance). Research supports a 
unidimensional structure for the scale (Sunderland et al., 2019[53]). The GAD-7 has 
demonstrated good internal consistency, convergent validity, and sensitivity to 
change in both patient and population samples (Lowe et al., 2008[62]; Beard and 
BjOrgvinsson, 2014[63]). While the scale has been successfully translated into 
multiple languages and local dialects, more research on potential cross-cultural bias 
of the tool needs to be conducted (Parkerson et al., 2015[64]; Sunderland et al., 
2019[53]). The scale focuses on general symptoms of anxiety and was not developed 
to assess the presence of other anxiety disorders, such as Social Anxiety Disorder. 
However, some researchers have argued that it can be used across different anxiety 
disorders, given the scale’s emphasis on the transdiagnostic process of worry and the 
fact that Generalised Anxiety Disorder has a high degree of comorbidity (Johnson 

et al., 2019[65]). The GAD-2 shorter version of this scale focuses only on the first two 
items (worry and difficulty to control the worry), i.e. the core criteria of generalised 
anxiety per the DSM. Available evidence has indicated support for its psychometric 
properties and validity in a range of settings (Byrd-Bredbenner, Eck and Quick, 
2021[66]; Hughes et al., 2018[67]; Luo et al., 2019[68]; Ahn, Kim and Choi, 2019[69]). 


Table 2.15. GAD-7/GAD-2 Questionnaire with scoring breakdown 


Not at Several More than Nearly 


all days half the days every day 


Over the last two weeks, how often have you 
been bothered by any of the_ following 


problems: 

1. Feeling nervous, anxious or on edge 0 1 2 3 
2. Not being able to stop or control worrying 0 1 2 3 
3. Worrying too much about different things 0 1 2 3 
4. Trouble relaxing 0 1 2 3 
5. Being so restless that it is hard to sit still 0 1 2 3 
6. Becoming easily annoyed or irritable 0 1 2 3 
7. Feeling afraid as if something awful might happen 0 1 2 3 


Note: Items in italics represent the 2-item shorter version of the scale (GAD-2). All items are added 
together to provide a total score ranging from 0-21 for the GAD-7, with higher scores indicating the 
presence of more anxiety symptomatology: 0-4 none, 5-9 mild anxiety, 10-14 moderate anxiety, 
15-21 severe anxiety. For the GAD-2, a score of 3 points is the suggested cut-off for identifying 
possible cases for which further diagnostic evaluation for generalised anxiety disorder is 
warranted. 


Source: Spitzer, R. et al. (2006[70]), “A brief measure for assessing generalized anxiety disorder: 
The GAD-7”, Archives of Internal Medicine, Vol. 166/10, pp. 1092- 


1097, http://dx.doi.org/10.1001/ARCHINTE.166.10.1092. 


Patient Health Questionnaire (PHQ-4): The PHQ-4 screening tool is a short, four- 
question tool to identify the presence and severity of core symptoms of both 
depression and anxiety, given that these are two of the most prevalent illnesses 
among the general population and often comorbid. The PHQ-4 pulls the two core 
depression-related questions from the PHQ-9/8 (which together are called the PHQ-2) 
plus two core anxiety-related questions from GAD-7 (which are called the GAD-2). 
Thus, the PHQ-4 is a combination of the PHQ-2 and GAD-2, which have independently 
been shown to be good, brief screening tools with construct and criterion validity (see 
above). Available evidence supports the PHQ-4’s psychometric properties, reliability 
and validity in studies focused on the general population, intervention, and workers 
and college students (Stanhope, 2016[71]; Knubchandani et al., 2016[72]; Lowe et al., 
2010[73)). 


Table 2.16. PHQ-4 Questionnaire with scoring breakdown 


Not at Several More than Nearly 
all days half the days every day 


Over the last two weeks, how often have you 
been bothered by any of the_ following 
problems: 


1. Feeling nervous, anxious or on edge 0 1 2 3 


2. Not being able to stop or control worrying 0 1 2 3 
3. Feeling down, depressed or hopeless 0 1 2 3 
4. Little interest or pleasure in doing things 0 1 2 3 


Note: All items are added together to provide a total score of psychological distress ranging from 
0-12, with higher scores indicating the presence of more symptomatology: 0-2 normal, 3-5 mild, 6- 
8 moderate, 9-12 severe. A total score greater than or equal to 3 for the first two items (GAD-2) 
indicates that the respondent is at risk for anxiety. A total score greater than or equal to 3 for the 
final two items (PHQ-2) indicates that the respondent is at risk for depression. 


Source: Kroenke, K. et al. (2009[74]), “An ultra-brief screening scale for anxiety and depression: 
The PHQ-4”, Psychosomatics, Vol. 50/6, pp. 613-621, http://dx.doi.org/10.1176/APPI.PSY.50.6.613. 


Washington Group on Disability Statistics Short Set on Functioning - 
Enhanced (WG-SS): The Washington Group Short Set on Functioning - Enhanced 
(WG-SS Enhanced) was developed by the Washington Group on Disability Statistics, 
which is composed of representatives from National Statistics Offices, as well as UN 
agencies, international non-governmental organisations and organisations for people 
who are disabled, to capture not only the presence but also the type and severity of a 
respondent’s disability for use in population and special interest surveys (Washington 
Group on Disability Statistics, 2020[75]). Its focus is on functioning in the areas of 
seeing, hearing, walking or climbing stairs, remembering or concentrating, self-care, 
communication, upper body activities, as well as affect. The four questions on the 
latter focus on symptoms of depression and anxiety, though the questionnaire is not 
typically used in its subcomponent parts. Regardless, the focus on overall functioning 
might carry important ways forward for capturing transdiagnostic symptoms of mental 
ill-health. 


Table 2.17. WG-SS Enhanced Questionnaire 


No Some A lot of Cannot 
difficulty difficulty difficulty do at all 


Do you have difficulty: 

1. Seeing, even when wearing your glasses? 
2. Hearing, even when using a hearing aid(s)? 
3. Walking or climbing steps? 


4, Using your usual language, communicating, 
for example understanding or being 
understood? 


5. Remembering or concentrating? 


6. With self-care, such as washing all over or 
dressing? 


7. Raising a 2-liter bottle of water or soda 
from waist to eye level? 


8. Using your hands and fingers, such as 
picking up small objects, for example, a 
button or pencil, or opening or closing 
containers or bottles? 


Daily Weekly Monthly Afew Neve 
times a r 
year 
9.How often do you feel worried, nervous or 
anxious? 
A little A lot Somewhere in 
between a little 
and a lot 
10. Thinking about the last time you felt 
worried, nervous or anxious, how would you 
describe the level of these feelings? 
Daily Weekly Monthly Afew Neve 
times a r 
year 
11. How often do you feel depressed? 
A little A lot Somewhere in 
between a little 
and a lot 


12. Thinking about the last time you felt 
depressed, how depressed did you feel? 


Note: Different domain-specific identifiers of functioning (and the severity of its impairment) can 
be calculated for an overall disability identifier. The recommended level of inclusion is: “a lot of 
difficulty” or “cannot do at all” for at least one of the first six questions, severity levels 3 or 4 for 
the two upper-body mobility questions, and severity level 4 for the anxiety or depression 
indicators. Items in italics represent the 6-item shorter version of the scale (Washington Group on 
Disability Statistics Short Set on Functioning), which excludes questions on mental health and 
upper body functioning. 


Source: Washington Group on Disability Statistics (2020[75]), The Washington Group Short Set on 
Functioning: Enhanced (WG-SS Enhanced), The Washington Group Data Collection Tools and their 
Recommended Use (washingtongroup-disability.com). 


Alcohol Use Disorders Identification Test/Concise (AUDIT/ AUDIT-C): The Alcohol Use 
Disorders Identification Test (AUDIT) is a 10-item alcohol screen developed by the 
WHO from the 1980s onwards that can help identify respondents or patients who are 
hazardous drinkers or have active alcohol use disorders (including alcohol abuse or 
dependence). Its validity has been demonstrated in settings beyond primary care, 
such as inpatient hospital wards, emergency departments, universities, workplaces, 
outpatient settings and psychiatric services (Berner et al., 2007[76]). Its short version 
of 3 items, designed to be integrated into routine patient interviews, has been found 


to have similar accuracy to the full-scale version and has been validated primarily in 
primary-care settings, as well as increasingly in more general population samples, 
including adults seeking online help with drinking (Bush et al., 1998[77]; Khadjesari 
et al., 2017[78]). 


Table 2.18. AUDIT/ AUDIT-C Questionnaire with scoring breakdown 


: : 4 or more 
Never Monthyor 24 umere 2.3 tor’ times a 
week 
1. How often do you have a drink 0O 1 2 3 4 
containing alcohol? 
lor2 3to4 5 to6 7to9 10 or more 
2. How many standard drinks 0 1 2 3 4 
containing alcohol do you have on a 
typical day? 
Never Less than Monthly Weekly Daily or 
monthly almost daily 
3. How often do you have six ormore 0 1 2 3 4 
drinks on one occasion? 
4. How often during the last year 0O 1 2 3 4 
have you found 
that you were not able to stop 
drinking once you 
had started? 
5. How often during the last year 0 1 2 3 4 
have you failed to 
do what was normally expected from 
you 
because of drinking? 
6. How often during the last year 0 1 2 3 4 
have you needed 
a first drink in the morning to get 
yourself going 
after a heavy drinking session? 
7. How often during the last year 0 1 2 3 4 


have you had a 


feeling of guilt or remorse after 
drinking? 


8. How often during the last year 0 1 2 3 4 


have you been 


unable to remember what happened 


the night 
before because you had_ been 
drinking? 
No Yes, but not in Yes, during 
the last year the last year 
9. Have you or someone else been 0 2 4 
injured as a 
result of your drinking? 
10. Has a relative or friend or a O 2 4 


doctor or another 


health worker been concerned about 
your drinking or suggested you cut 
down? 


Note: Items in italics represent the 3-item shorter version of the scale (AUDIT-C). All items are 
added together to provide a total score ranging from 0-40 (0-12 for the AUDIT-C), with higher 
scores indicating higher likelihood that a person’s drinking is affecting his or her safety. For the 
AUDIT, scores of 8 or more are recommended as indicators of hazardous and harmful alcohol use, 
as well as possible alcohol dependence. Since the effects of alcohol vary with average body weight 
and differences in metabolism, establishing the cut-off point for all women and men over age 65 
one point lower at a score of 7 will increase sensitivity for these population groups. For the AUDIT- 
C, in men (women), a score of 4 (3) or more is considered as identifying symptoms of hazardous 
drinking or active alcohol use disorders. 


Source: Bush, K. et al. (1998[77]), “The AUDIT alcohol consumption questions (AUDIT-C): An 
effective brief screening test for problem drinking”, Archives of Internal Medicine, 

Vol. 158/16, https://doi.org/10.1001/archinte.158.16.1789; WHO (2001[79]), AUDIT: The Alcohol 
Use Disorders Identification Test: Guidelines for use in primary health care, World Health 
Organization, https://www.who.int/publications/i/item/audit-the-alcohol-use-disorders-identification- 
test-guidelines-for-use-in-primary-health-care. 


Positive mental health 


Core questions from the OECD Guidelines on Measuring Subjective Well- 
being: The OECD Guidelines on Subjective Well-being propose a minimal set of 
measures of subjective well-being covering both life evaluation and (short-term) affect 
that could be included in household surveys (OECD, 2013[80]). The core measures 
included are the ones which have the strongest evidence when it comes to validity 
and relevance, and for which international comparability is the most important. An 
experimental measure of an aspect of eudaimonic well-being is also included. 


Table 2.19. OECD core questions on subjective well-being 


10 


The following question asks how satisfied you feel, on a scale from O to 10. Zero means you feel 
“not at all satisfied” and 10 means you feel “completely satisfied”. 


1. Overall, how satisfied are you with life as a whole these days? 


The following question asks how worthwhile you feel the things you do in your life are, on a scale 
from 0 to 10. Zero means you feel the things you do in your life are “not at all worthwhile”, and 10 
means “completely worthwhile”. 


2. Overall, to what extent do you feel the things you do in your life are worthwhile? 


The following questions ask about how you felt yesterday on a scale from 0 to 10. Zero means you 
did not experience the feeling “at all” yesterday while 10 means you experienced the feeling “all of 
the time” yesterday. | will now read out a list of ways you might have felt yesterday. 


3. How about happy? 
4. How about worried? 


5. How about depressed? 


Note: The three questions on affect (3-5) should be included as a group and are intended to 
provide a minimal set of questions required to characterise the affective state of the respondent 
on the previous day. 


Source: OECD (2013[80]), OFCD Guidelines on Measuring Subjective Well-being, OECD Publishing, 
Paris, https://doi.org/10.1787/9789264191655-en. 


WHO-5 Well-being index (WHO-5): The World Health Organization Well-Being 
Index (WHO-5) is a short questionnaire of 5 items that focus on a respondent’s 
positive affect. The questionnaire, adapted from the longer WHO/ICD-10 Depression 
Diagnosis and DSM-IV Depression scale by selecting a subset of positively phrased 
items, has first been used in a project on well-being measures in primary health care 
by the WHO Regional Office in Europe in 1998 and since then has been translated into 
more than 30 languages (World Health Organization, 1998[81]; Topp et al., 2015[7]). 
The WHO-5 has been applied as a generic scale for well-being across a wide range of 
study fields and countries, as a sensitive screening tool for depression as well as an 
outcome measure in clinical trials (Topp et al., 2015[7]). Studies of younger and 
elderly persons indicated a unidimensional structure for this scale (Topp et al., 
2015[7]). 


Table 2.20. WHO-5 questionnaire with scoring breakdown 


All of Most of More than soir ten Some of Atno 
the time the time half the time time the time time 
Over the past two weeks... 
1. | have felt cheerful and in 5 4 3 2 1 0 
good spirits 
2. | have felt calm and relaxed 5 4 3 2 al 0) 
3. | have felt active and 5 4 3 2 1 0 


vigorous 


4. | woke up feeling fresh and 5 4 3 2 1 0 
rested 


5. My daily life has been filled 5 4 3 2 1 0 
with things that interest me 


Note: All items are added together to provide a total score from 0 to 25, which is then multiplied by 
4 to normalise to a O (worst possible well-being) to 100 (best possible well-being) score. A cut-off 
score of less than or equal to 50, or less than or equal to 52 (Sandor et al., 2021[82]),is often used 
as indicative of reduced well-being, which has been validated in studies using the WHO-5 for the 
screening of depression and for predicting patient mortality. 


Source: Topp, C. et al. (2015[7]), “The WHO-5 well-being index: A systematic review of the 


literature”, Psychotherapy and Psychosomatics, Vol. 84/3, pp. 167- 
176, https://doi.org/10.1159/000376585. 


SF-36 Energy/Vitality subscale: The 4-item vitality subscale of the larger SF-36 
measure (See above) is a general measure of energy/fatigue. It has been validated in 
clinical settings and performed well compared to longer scales (e.g. for cancer-related 
fatigue) (Brown et al., 2011[83)]). 


Table 2.21. SF-36 vitality subscale questionnaire with scoring breakdown 


Allofthe Mostof Agoodbit Someof A little of None of 
time the time ofthetime thetime thetime the time 


How much of the time 
during the past 4 weeks... 


1. Did you feel full of pep? 1 2 3 4 D 6 
2. Did you have a lot of 1 2 3 4 5 6 
energy? 

3. Did you feel worn out? 1 2 3 4 5 6 
4. Did you feel tired? il 2 3 4 5 6 


Note: Standardised scores range from 0-100, with lower scores indicating greater fatigue. Scores 
<45 have been established as representing clinically significant fatigue. 


Source: Ware, J. et al. (1993[84]), SF-36 Health Survey: Manual and Interpretation Guide, The 
Health Institute, New England Medical Center Hospitals, wars aw researchaaté. ror ale 
Ith_S _M 


594a5b83aca2723195de5c3d/SF- 36-Health-Survey-Manual- a STeT ERE aewt Sica on 
22 January 2023); Donovan, K. et al. (2008[85]), “Identifying clinically meaningful fatigue with the 
Fatigue Symptom Inventory”, Journal of Pain and Symptom Management, Vol. 36/5, pp. 480- 

487, https://doi.org/10.1016/j.jpainsymman.2007.11.013. 


Satisfaction with Life Scale (SWLS): The Satisfaction with Life Scale was 
developed to assess people’s satisfaction and evaluation of their lives as a whole, 
rather than focusing on specific life domains. Early studies have found it to show good 
convergent validity with other types of subjective well-being, while being distinct from 
affective well-being measures (Pavot et al., 1991[86]; Pavot and Diener, 1993[87]). 


Table 2.22. SWLS questionnaire with scoring breakdown 


Neither 


Strongly Agre Slightly Slightly Disagre Strongly 


agree nor ‘ : 
agree e agree disagree disagree e disagree 
1. In most ways my life 7 6 5 4 3 2 1 
is close to my ideal. 
2. The conditions of my Z 6 5 4 3 2 1 
life are excellent. 
3. | am satisfied with my 7 6 5 4 3 2 i. 
life. 
4. So far | have gotten 7 6 5 4 3 Z 1 
the most important 
things | want in life. 
5. If | could live my life vi 6 5 4 3 2 1 


over, | would change 
almost nothing. 


Note: All items are added together to provide a total score from 5 to 35, where higher values 
indicate higher life satisfaction: 5-9 extremely dissatisfied, 10-14 dissatisfied, 15-19 slightly 
dissatisfied, 20-24 slightly satisfied, 25-29 satisfied, 30-35 extremely satisfied. 


Source: Diener, E. et al. (1985[88]), “The Satisfaction with Life Scale”, Journal of Personality 


Assessment, Vol. 49/1, pp. 71-75, https://doi.org/10.1207/s15327752jpa4901 13. 


The Mental Health Continuum Short-Form (MHC-SF): The MHC-SF is a 14-item 
scale developed by Keyes to capture positive mental health in his dual-continuum 
model (Keyes, 2002[89]). It was derived from the 40-item Mental Health Continuum 
Long Form (MHC-LF), and consists of separate subscales: three “emotional well-being” 
items (reflecting affective well-being plus life satisfaction), five “social well-being” 
items, and six “psychological well-being” items (which when combined reflect 
eudaimonic well-being) (Lamers et al., 2011[90]). Studies have shown high internal 
and moderate test-retest reliability for the MHC-SF and confirmed the 3-factor 
structure of the subscales, which also show convergent validity with corresponding 
aspects of well-being and functioning (Lamers et al., 2011[90]). 


Table 2.23. MHC-SF questionnaire with scoring breakdown 


Two or 


Once About Almost 
a or once a ihe every py 
twice week week day 
How often in the past month did you 
feel ... 
Emotional well-being (affect) 
1. Happy? 0 1 2 3 4 5 


2. Interested in life? 0 1 2 3 4 5 


3. Satisfied with your life? 0 1 2 3 4 5 
Social well-being (eudaimonic) 


4. That you had something important to 0 1 2 3 4 5 
contribute to society? (social contribution) 


5. That you belonged to a community (like a 0 1 2 3 4 5 
social group, your neighbourhood, your city, 
your school)? (social integration) 


6. That our society is becoming a better place 0O 1 2 3 4 5 
for people like you? (social growth) 


7. That people are basically good? (social O 1 2 3 4 5 
acceptance) 
8. That the way our society works makes 0 1 2 3 4 > 


sense to you? (social coherence) 
Psychological well-being (eudaimonic) 


9. That you liked most parts of your 0O 1 2 3 4 5 
personality? (self-acceptance) 


10. Good at managing the responsibilities of 0 1 2 3 4 5 
your daily life? (environmental mastery) 


11. That you had warm and trusting 0O 1 2 3 4 5 
relationships with others? (positive 
relationship with others) 


12. That you had experiences that challenged 0 1 2 3 4 5 
you to grow and become a better person? 
(personal growth) 


13. Confident to think or express your own’ 0 1 2 3 4 5 
ideas and opinions? (autonomy) 


14. That your life has a sense of direction or 0 1 2 3 4 5 
meaning to it? (purpose in life) 


Note: All items are summed, yielding a total score ranging from 0 to 70, with higher scores 
indicating greater levels of positive mental health. Subscale scores range from 0 to 15 for 
emotional well-being, from 0 to 25 for social well-being and from 0 to 30 for psychological well- 
being. “Flourishing” is defined by reporting = 1 of 3 emotional signs and = 6 of 11 eudaimonic 
signs (social and psychological subscales combined) experienced “every day” or “almost every 
day”. “Languishing” is defined by reporting = 1 of 3 emotional signs and = 6 of 11 eudaimonic 
signs experienced “never” or “once or twice”. Individuals who are neither flourishing nor 
languishing are categorised as “moderately mentally healthy”. 


Source: Lamers, S. et al., (2011[90]), “Evaluating the psychometric properties of the Mental Health 
Continuum-Short Form (MHC-SF)”, Journal of Clinical Psychology, Vol. 67/1, pp. 99- 


110, https://doi.org/10.1002/jclp.2074. 


The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS): The 14-item 
WEMWEBS scale was developed with funding from NHS Health Scotland in 2005 to 
measure mental well-being (conceived of as “both feeling good and functioning well”), 
taking the Affectometer 2 instrument as the starting point (Warwick Medical School, 
2021[91]). Some studies confirmed a unidimensional structure for WEMWBS, while 
others identified three residual factors relating to affective well-being, psychological 
functioning or eudaimonia, and social relationships (Shannon et al., 

2020[92]; Koushede et al., 2019[93]). A shorter, 7-item version of the scale, 
SWEMWEBS, is also available, focusing slightly less on affect (Stewart-Brown et al., 
2009[94]). (S)WEMWBS has been validated in various populations and among 
different subgroups, including adolescents, clinical samples and ethnic minority 
samples, and has been translated into more than 25 languages and validated in 
Norwegian, Swedish, Italian, Dutch, Danish, German, French and Spanish. Both scales 
have been shown to be sensitive to changes that occur in mental well-being 
promotion and mental illness treatment and prevention projects (Koushede et al., 
2019[93]). Both instruments can distinguish mental well-being between subgroups, 
but SWEMBS has been found to be less sensitive than the longer version to gender 
differences (Koushede et al., 2019[93]; Ng Fat et al., 2017[95]). 


Table 2.24. (S)WEMWBS questionnaire with scoring breakdown 


eee Rarely ae the Often sal dle 

Over the last two weeks... 

1. I’ve been feeling optimistic about the 1 2 3 4 5 
future 

2. I’ve been feeling useful 1 2 3 4 5 
3. I’ve been feeling relaxed i 2 3 4 5 
4. I’ve been feeling interested in other 1 2 3 4 5 
people 

5. I’ve had energy to spare 1 2 3 4 5 
6. I’ve been dealing with problems well 1 2 3 4 5 
7. I’ve been thinking clearly 1 2 3 4 5 
8. I’ve been feeling good about myself 1 2 3 4 5 
9 I’ve been feeling close to other people 1 2 3 4 5 
10. I’ve been feeling confident 1 2 3 4 5 
11 I’ve been able to make up my own 1 2 3 4 5 


mind about things 


12 I’ve been feeling loved 1 2 3 4 5 


13 I’ve been interested in new things 1 2 3 4 5 


14 I’ve been feeling cheerful 1 2 3 4 5 


Note: Items in italics represent the 7-item shorter version of the scale (SWEMWBS). For the 14-item 
scale, all items are summed, yielding a total score ranging from 14-70. For the 7-item scale, raw 
scores are transformed into a 7-35 metric score (See conversion table 

here: https://warwick.ac.uk/fac/sci/med/research/platform/wemwbs/using/howto/ 

swemwbs raw_score to_ metric score conversion table.pdf). For both scales, higher scores 
indicate greater levels of positive mental health. (S)WEMWBS scores approximate to a normal 
distribution, permitting parametric analysis. For categorical scoring, cut-off points for high, average 
and low mental well-being can be generated using two approaches: (1) a statistical approach 
putting the cut-off point at +/- one standard deviation, placing approximately 15% of the sample 
into high well-being and 15% into low well-being categories; or (2) a benchmarking approach 
against validated measures of depression, e.g. a score of 41-44 as indicative of possible/mild 
depression and a score of >41 as indicative of probable clinical depression, using the Center for 
Epidemiologic Studies Depression Scale (CES-D) as a benchmark. WEMWBS is protected by 
copyright. Those wishing to use WEMWBS can obtain a licence to do so. Please go to 
https://warwick.ac.uk/wemwbs/using for information on the type of licence you will require and 
details on how to apply. A free-of-charge “non-commercial” licence is available to public sector 
organisations, charities and registered social enterprises, as well as to researchers employed in 
Higher Education Institutions. Any further enquiries can be directed to wemwbs@warwick.ac.uk. 


Source: Warwick-Edinburgh Mental Well-being Scale (WEMWBS) © NHS Health Scotland, University 
of 


Warwick and University of Edinburgh, 2006, all rights reserved; Warwick Medical 
School (2021[91]), The Warwick-Edinburgh Mental Wellbeing Scales 


(WEMWBS), https://warwick.ac.uk/fac/sci/med/research/platform/wemwbs/. 
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Notes 


<= 1. Of course, this implies that diagnoses reached through clinical interviews are 
only as valid as the classification system they are based on (Mueller and Segal, 
2015[3]) (see also Box 3.4 in Chapter 3). 


« 2. Of course, the coverage of household surveys is also not complete and includes 
only those sampled. Typically, people living in institutional settings as well as the 
homeless (who are likely to have higher prevalence of mental ill-health than the 
general population) are not taken into account. 


« 3. The following countries responded to the questionnaire: Australia, Austria, 
Belgium, Bulgaria, Canada, Chile, Colombia, Costa Rica, Czech Republic, Denmark, 
Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, 
Korea, Latvia, Lithuania, Luxembourg, Mexico, the Netherlands, Norway, New Zealand, 
Poland, Portugal, Slovenia, the Slovak Republic, Spain, Sweden, Switzerland, Turkiye, 
the United Kingdom and the United States. 


« 4. The OECD also publishes administrative data on mental health service provision, 
such as the number of psychiatrists, psychologists or mental health professionals per 
100 000 population; the number of hospital beds devoted to mental health care; 
spending on mental health services; etc. (OECD, 2021[17]). As these are not 
considered population-level mental health outcomes, they are not further considered 
for the purposes of this project. 


< 5. Percentages do not add up to 68% because some countries did both: introduced 
new stand-alone surveys and added mental health modules to existing surveys. 


« 6. Furthermore, it is worth noting that while the MHI-5 appeared in the well-being 
ad hoc modules for the 2013 and 2018 European Union Statistics on Income and 
Living Conditions (EU-SILC) survey administered by Eurostat, in future well-being 
modules the tool has been removed. Therefore, future use of the MHI-5 may be 
significantly diminished, although some individual member states may elect to keep 
the measure in their own national health and/or well-being surveys. 


< 7. For an extended discussion of surveys used to measure attitudes and stigma 
towards mental health, refer to Table 6.2 in (OECD, 2021[1]). 


3. Good practices for measuring population 
mental health in household surveys 


Abstract 


All OECD countries currently measure population mental health, yet use a variety of 
tools to capture a multitude of outcomes. In order to improve harmonisation, this 
chapter poses a series of questions that highlight the criteria to be considered when 
choosing appropriate survey tools. These criteria include statistical quality, 
practicalities of fieldwork and data analysis. Overall, there is strong evidence 
supporting the statistical properties of the most commonly used screening tools for 
the composite scales of mental ill-health and positive mental health. Four concrete 
tools (the PHQ-4, the WHO-5 or SWEMWBS, and a question on general mental health 


status) that capture outcomes across the mental health spectrum are suggested for 
inclusion in household surveys in addition to already ongoing data collection efforts. 


Countries across the OECD are already implementing a variety of survey tools to 
measure aspects of population mental health. Chapter 2 highlighted that while there 
is some degree of harmonisation for outcomes such as risk for depression, life 
evaluation and general psychological distress, there are gaps in coverage for others: 
in particular, anxiety; other specific mental disorders (bipolar disorder, PTSD, eating 
disorders and so on); and affect and eudaimonic aspects of positive mental health. 
Before settling on a concrete list of recommendations for member countries, this 
chapter provides an overview of the properties that should be considered when 
selecting a specific tool to measure these outcomes in household surveys. 


While OECD countries are already using a variety of tools, including structured 
interviews, data on previous diagnoses, experienced symptoms and questions on 
Suicidal ideation and suicide attempts, this chapter will focus on the statistical 
qualities of three other tools in common use: screening tools1, positive mental health 
indicators and general questions on mental health status. This is for two reasons. 
First, these tools are standardised in terms of question formulation and thus provide 
the easiest foundation on which to make harmonised recommendations. Second, 
these tools are more commonly featured in general social surveys (as compared to 
tools for diagnoses or experienced symptoms), which tend to be collected more 
frequently than health-specific surveys. Taken together, these three tools also provide 
a holistic measure of mental health, encompassing the full possibility of outcomes 
conceptualised by both the single and dual continua (see Figure 1.2), and they 
provide more nuance than, say, measures of suicidal ideation or attempts. These tools 
are then the most promising when thinking of pragmatic recommendations that can 
be taken up by the largest number of countries. 


When selecting an appropriate tool, the overarching consideration is how to measure 
the different facets of mental health most accurately - across countries, groups and 
time - in a way that can be used by government as a part of an integrated policy 
approach to mental health. High-quality data are needed to provide insights into how 
societal conditions (economic, social, environmental) affect the mental health of 
different population groups and whether these conditions contribute to improving or 
declining mental health. No data are completely without measurement bias, and it is 
always important that data collection entities enact rigorous quality controls to 
minimise the amount of noise in a measure. However, there are challenges specific to 
the measurement of mental health, due to stigma and bias affecting survey response 
behaviour, different cultural views and evolving attitudes towards mental health over 
time. Furthermore, household surveys by definition exclude institutionalised 
populations, including those in long-term care facilities, hospitals or prisons, as well as 
people with no permanent addresses, all of whom may have higher-than-average risk 
for some mental health conditions. 


Good practices for measuring mental health at the population level differ in several 
ways from those for measuring mental health at the clinical level. For national 
statistical offices or health ministries conducting large-scale, nationally representative 
surveys, implementing long structured interviews is impractical, even though these 
may be considered the gold standard from a clinical perspective. The end users of the 
data are different, and policy makers have other needs than clinicians: tracking 
overall trends (over time, across at-risk groups, among countries), and factors of risk 
and resilience in population groups vs. diagnosing an individual and developing a 
treatment plan. These needs guide this chapter’s discussion. 


This chapter provides a guide to good practices in producing high-quality data on 
population mental health outcomes, by posing a series of questions for data collectors 
to consider. High-level findings from this exercise are shown in Table 3.1, below. The 
specific screening and composite-scale tools included in the table are those that are 
used most frequently across OECD countries (for more information on each, refer to 
Table 2.7, Table 2.11, Table 2.12 and Annex 2.B).2 Questions are grouped into three 
overarching categories, covering (1) statistical quality, (2) data collection procedures 
and (3) analysis. Evidence from existing research is used to illustrate each question 
area, rather than to comprehensively assess every mental health tool used by OECD 
countries. These framing questions serve as a lens for assessing the advantages and 
disadvantages of different tools for measuring population mental health and to guide 
the concrete recommendations for tool take-up and harmonisation outlined in the 
conclusion. 


Table 3.1. Overview of mental health tool performance on statistical quality, data 
collection and analysis metrics 


Tool Information Statistical Quality Data Analysis Country 
Covera 
ge 
Name Topic Referenc Reliabili Validi Low High Sensiti Normal Sensitivi OECD 
coverage e period ty ty missincomparabil veto distributi ty/ countri 
and item g ity across change on specificit es 
length rates groups yof  reportin 
threshol g its 
ds use 


Validated screening tools for assessing mental ill-health 


Psychological distress 


General Negative Recently 0 0 0 ~ 0 0 5 of 37 
Health and 
Questionnair positive 
e affect, 

somatic 
(GHQ-12) symptom 

Ss, 

functional 

impairme 

nt; 

12 items 
Kessler Scale Negative Past 4 O 0 ~ 0 4 of 37 
6 (K6) affect; weeks 

6 items 
Kessler Scale Negative Past 4 0 0 ~ 0 4 of 37 
10 (K10) affect, weeks 

functional 

impairme 

nt; 


10 items 


Mental 
Health 
Inventory 5 
(MHI-5) 


Negative 
and 
positive 
affect; 


5 items 


Depressive symptoms 


Patient 
Health 


Questionnair 


e -8 or -9 


(PHQ-8 / 
PHQ-9) 


Patient 
Health 


Questionnair 
e -2 (PHQ-2) 


Center for 


Epidemiologi 


cal Studies 
Depression 
Scale (CES- 
D) 


Negative 
affect, 
anhedoni 
a, 
somatic 
symptom 
S, 
functional 
impairme 
nt 
(matched 
to major 
depressiv 
e disorder 
per DSM- 
IV and 
DSM-5 
criteria); 


8 or9 
items 


Negative 
affect, 
anhedoni 
a; 


2 items 
Negative 
affect, 


anhedoni 
a; 


20 items 


Symptoms of anxiety 


Generalised Negative 


Anxiety 
Disorder-7 


(GAD-7) 


affect, 
somatic 
symptom 
Ss, 
functional 
impairme 
nt; 


7 items 


Past 
month 


Past 2 
weeks 


Past 2 
weeks 


Past 
week 


Past 2 
weeks 


U 


O 


O 


28 of 


30 of 
37 


8 of 37 


1 of 37 


11 of 


Generalised Negative 


Anxiety 
Disorder-2 


(GAD-2) 


Symptoms of depression and anxiety 


affect, 
functional 
impairme 
nt; 


2 items 


Patient Negative 
Health affect, 
Questionnair anhedoni 
e -4 (PHQ-4) a, 
functional 
impairme 
nt; 
4 items 


Standardised tools for assessing positive mental health 


Short Form Negative 


Health 


and 


Status  (SF- positive 


12) 


Warwick- 
Edinburgh 


affect, 
functional 
impairme 
nt (Mental 
Health 
Compone 
nt 
Summary 
); 


12 items 


Positive 
affect, 


Mental Well- eudaimon 
Being Scale ia, social 


(WEMWBS) 


Short 
Warwick- 
Edinburgh 


Mental Well- 
Being Scale 
(SWEMWBS) 


WHO-5 
Wellbeing 


Index (WHO- 


5) 


well- 
being; 


14 items 


Positive 
affect, 
eudaimon 
ia, social 
well- 
being; 


7 items 
Positive 
affect; 


5 items 


Past 2 
weeks 


Past 2 
weeks 


Past 4 
weeks 


Past 2 
weeks 


Past 2 
weeks 


Past 2 
weeks 


7 of 37 


13 of 
37 


8* of 37 


2 of 37 


6 of 37 


6 of 37 


Mental Positive Past 0 ~ ~ 0 2 of 37 
Health affect, month 
Continuum eudaimon 
Short-Form _ ia, life 
(MHC-SF) satisfactio 
n, social 
well-being 


14 items 


Single-question self-reported general mental health status 


Self-reported Varies Varied ~ ~ O O 23 Of 
mental widely, (ranges 37 
health including from 

(SRMH) self- current 


reported: assessme 
general nt to last 
mental 12 
health months) 
status; 

number of 

mentally 

healthy 

days; 

recovery 

from 

mental 

health 

condition; 
satisfactio 

n with 

mental 

health; 

extent to 

which 

mental 

health 

interferes 

in daily 

life; 


Single 
question 


Note: {j indicates that the evidence shows this tool performs well on this dimension; ~ indicates 
that the evidence shows this tool performs only fairly; [j indicates that the evidence shows this tool 
performs poorly; and © indicates that evidence is limited or missing. If a cell is blank, this means 
that no research on this tool / topic combination was reviewed for this publication. * Refers to the 
fact that Germany included the longer SF-36 (rather than the shorter SF-12) in its 1998 German 
National Health Interview and Examination Survey, however the instrument will not be used in 
future due to licensing fees. Refer to Annex 2.A and Annex 2.B for more information about each 
tool. Country coverage refers to all OECD countries except Estonia, which did not participate in the 
questionnaire. 


Source: Literature reviewed in this chapter; Responses to a questionnaire sent to national 
statistical offices in January 2022. 


Statistical quality 


A suitable measurement instrument for population mental health should perform well 
across a range of statistical qualities, including reliability, validity, ability to 
differentiate between different latent constructs, minimal non-response or refusals, 
comparability across groups and sensitivity to change. In addition, practical 
considerations surrounding a tool are important, such as keeping it short enough in 
length, with low redundancy between question items, so as to avoid respondent 
fatigue. These qualities interact with one another, meaning that in practice the goal is 
to balance the trade-offs of each in order to find a sensible solution. An instrument 
that performs well in one quality criterion - i.e. validity - may perform poorly in 
another - i.e. length of the questionnaire and/or non-response rates. Thus before 
choosing a metric, it is important for survey producers to weigh the costs and benefits 
of each approach to identify a tool suitable for their context. 


How reliable are survey measures of mental health? 


Measures of population mental health should produce consistent results when an 
individual is interviewed or assessed under a given set of circumstances. This 
concept, called reliability, is about ensuring that any changes detected in outcomes 
have a low likelihood of being due to problems with the tool itself - i.e. measurement 
error - and instead reflect actual underlying changes in the individual’s mental health 
(Box 3.1). 


Box 3.1. Statistical definitions: Reliability 


Two important aspects of reliability are test-retest reliability and internal consistency 
reliability (OECD, 2013[1]; OECD, 2017[2]). 


Test-retest reliability concerns a scale’s stability over time. A respondent is re- 
interviewed or re-assessed after a period of time has passed, and their responses to a 
given questionnaire item are compared to one another. The expectation is that 
(assuming no change in the underlying state being measured) a reliable measure 
should lead to responses that are highly correlated with one another. There is no fixed 
rule for the length of time between the initial interview and follow-up: practice ranges 
from as short as 2-14 days to six months, depending on the assessment type (NHS 
Health Scotland, 2008[3]). 


The test-retest criterion must be applied thoughtfully in the case of mental health 
measurement instruments, as mental health states (and particularly affective states) 
can fluctuate over short periods of time for a given individual. This means that 
measurement instruments addressing specific symptoms or states can be highly 
reliable yet still produce different results for the same individual over a period of days 
or weeks, aS symptoms and experiences themselves wax and wane. In the context of 
measuring population mental health outcomes, then, test-retest reliability is 
particularly relevant for: 


° Simple measures that concern whether an individual has ever been diagnosed 
with a mental health condition (where a good instrument should have a very high test- 
retest correlation) 


° Establishing whether a short-form measure (or a measure being validated 
against a clinical diagnosis) is performing with the same test-retest accuracy as a long- 
form measure (or clinical diagnosis) when the two are administered to the same 
respondent, and/or 

° Establishing the broad stability of symptom-based measurement scales over 
short time periods and across large samples - i.e. while the test-retest correlation of 
questions for a set of symptoms is unlikely to be perfect for a given individual (if 
symptoms themselves are not always stable), day-to-day fluctuations in symptoms at the 


individual level can be expected to wash out across large samples to produce a similar 
distribution of scores over a short time period.2 


Assessing test-retest reliability therefore indicates a trade-off between measures that 
are sufficiently stable, yet sensitive to change over time. An instrument that performs 
well on test-retest reliability may perform poorly on tests to measure sensitivity to 
change, which underscores the importance of looking at statistical quality measures 
holistically when making decisions as to which tools to implement. 


Internal consistency reliability assesses the extent to which individual items 
within a survey tool are correlated to one another when those items aim to capture 
the same target construct. In the context of measuring population mental health, this 
might mean that, in a battery of items designed to measure depression and anxiety, 
the depression items correlate with one another, and the anxiety items correlate with 
one another (see also Box 3.3 for a discussion of factorial validity). The most widely 
used coefficient for internal consistency reliability is Cronbach's alpha, which is a 
function of the total number of question items, the covariance between pairs of 
individual items and the variance of the overall score.1 Although there is not universal 
consensus, most researchers agree that a coefficient value between 0.7 and 0.9 is 
ideal (NHS Health Scotland, 2008[3]). Values below 0.7 may reflect the fact that items 
within the scale are not capturing the same underlying phenomenon (OECD, 2013[1]), 
while values above 0.9 may indicate that the scale has redundant items. 


Notes: 


<1. The Cronbach coefficient alpha is commonly used in the literature to assess the 
internal consistency reliability of multi-item tools. The coefficient is calculated by 
multiplying the mean paired item covariance by the total number of items included in the 
scale and dividing this result by the sum of all elements in the variance-covariance 
matrix (OECD, 2013[1]). This results in a coefficient ranging from O (scale items are 
completely independent from one another, no covariance) to 1 (scale items overlap, 
complete covariance). 


« 2. The definition of “a short time period” is subjective and can vary depending on 
circumstance. For example, although the period of a couple of days may be deemed an 
acceptably short period of time over which a test-retest assessment could be 
administered, if there were to be an extreme shock in the intervening days, either 
positive or negative, there would be good grounds to expect change in the underlying 
distribution. Frequent data collection on mental health during the COVID-19 pandemic 
illustrated the volatile nature of many affect-based measures, with large spikes coinciding 
with the introduction / easing of confinement policies. 


The performance of screening tools on measures of reliability varies across tools and 
the outcomes they measure. There are mixed findings for general measures of 
psychological distress. The General Health Questionnaire (GHQ-12) as well as the 
Short Form-36 (SF-36) and its shorter sub-component, the Mental Health Inventory 
(MHI-5), have been shown to have good reliability (Schmitz, Kruse and Tress, 
2001[4]; Ohno et al., 2017[5]; Elovanio et al., 2020[6]; Strand et al., 2003[7]); 
however, while the longer Kessler (K10) has been shown to be internally consistent, 
the test-retest reliability of the shorter Kessler (K6) tool has not been assessed in any 
studies (El-Den et al., 2018[8]; Easton et al., 2017[9]). 


Conversely, screening tools for specific mental conditions - especially depression - are 
the most studied, and they have been shown to be reliable in terms of both test-retest 
reliability and internal consistency reliability. A meta-analysis of 55 different screening 
tools for depression found the Patient Health Questionnaire (PHQ-9) to be the most 
evaluated tool, with a number of studies concluding that both it and the PHQ-8 (a 
shorter version with the final question on suicidal ideation removed) have high 


reliability and validity (El-Den et al., 2018[8]). The same report, however, found that 
the shorter Patient Health Questionnaire-2 (PHQ-2) lacked consistent data on validity 
and reliability: among the six reports that evaluated the PHQ-2, only one reported on 
its internal consistency or test-retest reliability (El-Den et al., 2018[8]), which led the 
authors to caution that the reliability of the PHQ-2 cannot be confirmed with available 
data. The Center for Epidemiological Studies Depression Scale (CES-D), although less 
studied than the PHQ, has also been found to have good reliability, on both 

metrics (Ohno et al., 2017[5]). Among anxiety tools, the Generalised Anxiety Disorder 
screeners (both the longer GAD-7 and shorter GAD-2) have been found to be reliable, 
with good test-retest and internal consistency reliability (Ahn, Kim and Choi, 
2019[10]; Spitzer et al., 2006[11]). 


A study of the Patient Health Questionnaire-4 (PHQ-4), which combines the PHQ-2 and 
GAD-2 to generate a composite measure of both depression and anxiety, found lower, 
yet still acceptable reliability (Cronbach’s alpha > 0.80 for both sub-scales) (Kroenke 
et al., 2009[12]). Another study of the PHQ-4 found lower item-intercorrelations but 
deemed the reliability to be acceptable given the short length of the scales (Lowe 

et al., 2010[13]).3 Because Cronbach’s alpha is in part a function of the total item 
length (refer to Box 3.1), shorter scales will perform worse on tests of internal 
consistency by construction. However shorter measures, with less redundancy 
between question items, are often preferred by survey creators, as they entail a lower 
burden for respondents. 


Composite scales capturing aspects of positive mental health have also been found to 
be reliable. A study of the 14-question Warwick-Edinburgh Mental Well-Being Scale 
(WEMWBS) tool found it to have high test-retest reliability (0.83 at one week) anda 
high Cronbach's alpha (around 0.9) (Tennant et al., 2007[14]; NHS Health Scotland, 
2016[15]). The authors cautioned, though, that the high Cronbach’s alpha suggests 
some redundancy in the scale items, a concern that led to the development of the 
shorter seven-item version (SWEMWBS) (Tennant et al., 2007[14]; NHS Health 
Scotland, 2016[15]). Multiple studies of WEMWBS and SWEMWBS found them both to 
have strong test-retest reliability (Stewart-Brown, 2021[16]; Shah et al., 2021[17]). 
The World Health Organization-5 (WHO-5) composite scale has also been tested for 
reliability in a variety of settings (Dadfar et al., 2018[18]; Garland et al., 2018[19]). 
Similarly, the MHC-SF has been found to have high internal reliability, though its test- 
retest reliability is only moderate (Lamers et al., 2011[20]). 


Fewer studies have investigated the reliability of general self-reported indicators of 
mental health status; however, evidence from the United States suggests that these 
measures have acceptable test-retest reliability. The health-related quality-of-life tool 
used by the United States Centers for Disease Control, the Behavioral Risk Factor 
Surveillance System (BRFSS) survey, measures perceived health by combining 
physical and mental health. A study in the state of Missouri found that the shorter 
version of the tool, with four items, has acceptable test-retest reliability and strong 
internal validity, although reliability was lower among older adults (Moriarty, Zack and 
Kobau, 2003[21]). 


Box 3.2. Key messages: Reliability 


° Most mental health screening tools, including both surveys that identify specific 
mental disorders and those that identify positive mental health, have been found to have 
strong reliability, as measured through both test-retest and internal consistency 
measures. 

° Test-retest reliability must be considered in tandem with a measure’s sensitivity 
to change over time, rather than blindly applied as a quality criterion. 


° There is strong evidence for the reliability of screening tools (especially those 
focusing on depression) and, to a somewhat lesser extent, positive mental health 
composite scales. However, fewer studies have been done to assess the reliability of 
general self-reported indicators of mental health status; more research is needed in this 
area. 


How well does the tool measure the targeted outcome? 


In addition to being reliable, a good measurement instrument must be valid, i.e. the 
measures provided by the tool should accurately reflect the underlying concept. For 
indicators that are more objective, validity can be assessed by comparing the self- 
reported measure against an objective measure of the same construct. For example, 
respondents’ self-reported earnings could in theory be cross-checked with their tax 
returns, or pay slips, to ascertain whether their response was reported accurately. Of 
course there are practical reasons that prevent this from being done systematically, 
but this illustrates that there are ways of assessing the validity of self-reported 
earnings data. Conversely, it is not possible to ascertain the “objective truth” of a 
subjective indicator, such as subjective well-being, trust or indeed mental health. This 
does not mean that validity cannot be assessed: OECD measurement guidelines use 
the concepts of face validity, convergent validity and construct validity to assess the 
validity of subjective indicators (OECD, 2013[1]; OECD, 2017[2]) (Box 3.3). 


Unlike many subjective indicators, the bulk of screening tools to assess mental health 
have been validated against diagnostic interviews for common mental disorders, 
which provide a rigorous assessment of their accuracy and real-world meaning. The 
most common diagnostic interview against which mental health screening tools are 
validated is the World Health Organization’s Composite International Diagnostic 
Interview (WHO-CIDI), which was designed for use in epidemiological studies as well 
as for clinical and research purposes (see Chapter 2 for more details). This tool allows 
to measure the prevalence of mental disorders, the severity of these disorders, their 
impact on home management, work-life balance, relationships and social life, as well 
as mental health service and medications use. Although the CIDI is widely accepted as 
a gold standard against which mental health survey items should be assessed, it is 
not immune to criticisms and validity concerns (Box 3.4). 


Box 3.3. Statistical Definitions: Validity 


Validity is more difficult to ascertain than reliability, especially for subjective data for 
which an objective truth is unknowable, and which typically cannot be compared to an 
equivalent objective measure. Three ways of assessing validity for subjective 
measures include face validity, convergent validity and construct validity. 


Face validity evaluates whether the indicator makes intuitive sense to the 
respondent and to (potential) data users. One way to indirectly measure face validity 
is through non-responses. High levels of non-response may indicate that respondents 
do not understand or see the relevance or usefulness of the question. In the case of 
mental health, high levels of non-response may also reflect a degree of discomfort 
with the topic due to stigma and bias, rather than lack of face validity. (An extended 
discussion of non-response and mental health measures appears later in this chapter.) 
Cognitive interviewing can also be used. 


Convergent validity is assessed by how well the indicator correlates to other 
proxies of the same underlying outcome. Using mental health tools as an example, 
were a researcher to introduce a new tool to assess anxiety, s/he could test its 
convergent validity by comparing it to pre-existing screening tools for data on anxiety, 
diagnosis or mental health service use, self-reported assessments of anxiety level, 


and/or bio-physical markers of stress and anxiety (heart rate, blood pressure, 
neuroimaging, etc.). 


Construct validity is the extent to which the indicator performs in accordance with 
existing theory or literature. For example, research shows that mental health and 
physical health are correlated with one another and co-move. Therefore, if a new 
mental health tool showed little correlation with physical health, or if changes in 
mental health as measured by this tool did not reflect any changes in physical health, 
the scale would be suspected of having low construct validity. The growing literature 
on the social determinants of health can also be leveraged to assess construct 
validity, in a similar way. 


In addition to the three aspects of validity mentioned above, clinical validations of 
mental health survey items often refer to three additional assessments: criterion 
validity, factorial validity and cross-group validity. 


Criterion validity exists only when there is a gold standard against which an item 
can be compared. In the case of mental health, this gold standard is typically a 
structured interview (e.g. the CIDI, refer to Annex 2.B). Criterion validity assesses the 
psychometric properties of a measure, i.e. how it compares to the gold standard. A 
measure is said to be sensitive if it can accurately identify a “true positive” (i.e. how 
often the survey accurately identifies someone at risk of, say, depression); it is 
specific if it can accurately identify a “true negative” (i.e. it accurately identifies 
someone as not at risk for depression). In order to establish diagnostic accuracy, 
sensitivity and specificity are plotted in a receiver operating characteristic (ROC) 
curve at various thresholds. The area under the curve (AUC) can then be used to 
assess the diagnostic performance of the screening tool in comparison to the gold 
standard.1 


Factorial validity assesses whether a multi-item survey tool is measuring one, or 
several, underlying concepts. In almost all cases, unidimensionality is desired if only a 
single construct is being assessed; this provides assurance that the mental health tool 
is measuring, for example, depression, anxiety or latent well-being. However, if a 
scale is assessing multiple dimensions of mental health, then multidimensionality is 
desired. For example, factor assessments for the PHQ-4, which measures depression 
and anxiety, indeed identify two latent factors (Lowe et al., 2010[13]). Factorial 
validity is commonly assessed using either confirmatory factor analysis (CFA) or 
exploratory factor analysis (EFA). In the former, researchers test a hypothesis that the 
relationship between an observed variable (e.g. respondents’ answers to the PHQ-8 
tool) and an underlying latent construct (e.g. depression) fits a given model. That is, 
using CFA, researchers test the hypothesis that an observed dataset has a given 
number of underlying latent factors. Using EFA, researchers do not impose a 
theoretical model and instead work backwards to uncover the underlying factor 
structure (Suhr, 2006[22]). 


Cross-group validity, or cultural validity, refers to the extent to which a measure is 
applicable across different population groups. There are a range of ways that cross- 
group validity can bias mental health outcome measures, including through cultural 
factors affecting the way in which symptoms are expressed, clinical bias (either 
implicit or explicit), language limitations of the respondent (if the tool is being 
implemented in a language other than their mother tongue) and differences in 
response behaviour (e.g. greater likelihood to choose midpoint values on Likert scales 
rather than extreme values) (Leong, Priscilla Lui and Kalibatseva, 2019[23]). Cross- 
group validity is best ensured by validating a survey tool in the requisite population, 
rather than applying it blindly. 


Notes: 


<1. A receiver operating characteristic curve provides a visualisation of diagnostic ability 
by plotting the true positive rate against the true negative rate. The curve can be used to 
determine the optimal cut-off point, which minimises both Type 1 (false positive) and 
Type II (false negative) errors. ROC analysis is used in determining the threshold cut-off 
scores, which are discussed later in this chapter. For more information on ROC and its use 
in clinical psychology, refer to (Pintea and Moldovan, 2009[24]) and (Streiner and Cairney, 
2007[25)]). 


To assess the validity of screening tools, researchers typically implement a study in 
which respondents both answer the self-reported scale and participate in a structured 
CIDI interview, with their responses to both then compared. A screening tool with high 
sensitivity and specificity is said to have high criterion validity. Although criterion 
validity ensures that screening tools are designed to mirror diagnostic outcomes from 
the CIDI, screening tools by design estimate higher prevalence rates for specific 
mental disorders (See Box 2.1). Convergent validity is assessed by comparing 
different screening tools against one another to see whether a new tool for 
measuring, say, depression, performs similarly to existing measures for depression. 
This approach is often used when testing shortened versions of screening tools, to see 
whether the truncated survey performs as well as its longer, more in-depth, 
predecessor. The majority of screening tools described in this chapter have been 
validated against diagnostic interviews for common mental disorders and have 
reported good psychometric properties (high sensitivity and specificity) across age 
groups, gender and socio-economic status (Gill et al., 2007[26]; O’Connor and 
Parslow, 2010[27]; Huang et al., 2006[28]) (Box 3.3). 


Box 3.4. Validity of structured interviews 


One important caveat to using structured interviews to validate screening tools is that 
it presupposes the structured interviews to be an accurate measure of “true” 
underlying mental health. This issue is raised in two different contexts: (1) most 
screening tools used in OECD countries were validated against the fourth version of 
the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV), published in 1994, 
which is now outdated, rather than against the DSM-5, published in 2013; and (2) the 
extent to which the DSM itself provides accurate diagnostic data cross-culturally. 


The first concern relating to the validity of structured interviews has to do with the 
fact that none of the screening tools commonly in use have been validated against 
the newer DSM version (Statistics Canada, 2021[29]). Yet, in total, there are 464 
differences between the DSM-IV and DSM-5. Broadly speaking, the DSM-5 includes 
fewer diagnostic categories, as many previously separate disorders share a number of 
features or symptoms. In addition, greater effort was made to separate an individual’s 
functioning status from their diagnosis. One area that could have an impact moving 
forward is the lowering of the diagnostic threshold for generalised anxiety disorder - a 
move that has been criticised by some psychiatrists for pathologising what had 
previously been considered quotidian worries (Murphy and Hallahan, 2016[30]). 


In sum, even though there are always changes between DSM updates that include the 
restructuring of diagnostic categories and the updating of some diagnosis criteria, 
there is by design a degree of continuity between different DSM versions, and most 
changes are minor. Regardless, in order to be up to date with most recent clinical 
practice, instruments like the CIDI would benefit from an update. 


On the second point, there are concerns about the applicability of these diagnostic 
validations to non-US regions and population groups, which at the very least would 
require validation studies to be conducted in different local contexts. Beyond this, 
validating mental health screening tools in more geographically diverse clinical 
settings may be insufficient if the clinical diagnoses underpinning the validation are 


themselves flawed. Haroz and colleagues investigated the extent of this cross-cultural 
bias by reviewing 138 qualitative studies of depression reflecting 77 different 
nationalities and ethnicities (Haroz et al., 2017[31]). They found that only 7 of the 15 
most frequently mentioned features of depression across non-Western populations 
reflect the DSM-5 diagnosis of Major Depressive Disorder. DSM-specified diagnostic 
features including “problems with concentration” and “psychomotor agitation or 
slowing” did not appear frequently, while features including “social isolation or 
loneliness”, “crying”, “anger” and “general pain” - none of which are included as 
diagnostic criteria - did. Some features arose more frequently in certain regions: 
“worry” in South and Southeast Asia, and “thinking too much” in Southeast Asia and 
sub-Saharan Africa. This implies that the close alignment of the PHQ-9 or the GAD-7 
with the DSM criteria could in theory limit detection of the underlying targeted 
construct (i.e. depression, anxiety) relative to longer or more comprehensive 
screening tools and/or structured interviews (Ali, Ryan and De Silva, 

2016[32]; Sunderland et al., 2019[33)]). 


Although criticisms of DSM criteria do exist, the DSM still remains the most useful tool 
for enabling cross-country comparative data on mental health outcomes. While 
improvements could be made, the DSM includes considerations of cultural validity in 
its drafting, which are updated in each subsequent iteration. 


Moving beyond clinical psychology, a few OECD countries have expanded their 
definition of mental health to encompass a wider range of viewpoints, beyond the 
traditional ones rooted in a Western perspective. In New Zealand, for example, the 
Government Inquiry into Mental Health and Addiction includes a Maori perspective of 
mental health (New Zealand Government, 2018[34]). In a similar vein, the Swedish 
government will solicit feedback from the Sami parliament when drafting its upcoming 
strategy on mental health (Public Health Agency of Sweden, 2022[35]). 


Across measures of general psychological distress, the Kessler and MHI-5 scales have 
stronger criterion validity than the GHQ-12. Studies have found that the K10 and K6 
scales have strong psychometric properties (encompassing both reliability and 
validity) and better overall discriminatory power than the GHQ-12 in detecting 
depressive and anxiety disorders (Furukawa et al., 2003[36]; Cornelius et al., 
2013[37]). The mental health component of the SF-12 tool is also better able to 
discriminate between those with and those without specific mental health conditions, 
as compared to the GHQ-12 (Gill et al., 2007[26]).While the MHI-5 tool has been found 
to be just as valid as the longer MHI-18 and GHQ-30 to assess a number of mental 
health conditions, including major depression and anxiety disorders, it performed less 
well than the MHI-18 for the full range of affective disorders (Berwick et al., 1991[38]). 
While the MHI-5 was designed as a general tool, it has been proven effective to 
identify a specific risk for depression and/or anxiety (Yamazaki, Fukuhara and Green, 
2005[39]; Rivera-Riquelme, Piqueras and Cuijpers, 2019[40]). 


A recent meta-analysis of the sensitivity and specificity of instruments used to 
diagnose and grade the severity of depression reported that, on average, the PHQ-9 
demonstrated the highest sensitivity and specificity relative to other screening tools, 
including the CES-D (Pettersson et al., 2015[41]). A different version of the PHQ-8 has 
been used in the CDC Behavioral Risk Factor Surveillance System (BRFSS) survey. This 
measure, referred to as the PHQ-8 days, asks respondents how many days over the 
past four weeks they have experienced each of the eight depressive symptoms that 
make up the PHQ-8. This yields a scale ranging from 0-112 and can provide a look at 
depression risk that is more granular - better identifying individuals who may be at 
risk for mild depression but currently have higher levels of mental well-being - and 
also more sensitive to change (Dhingra et al., 2011[42]). The PHQ-2 has been 
assessed for its internal consistency, construct validity and correlation convergent 


validity; however, a meta-analysis did not find evidence of studies of criterion 

validity (El-Den et al., 2018[8]). Another overview cites evidence for the PHQ-2 as 
having good criterion validity for specific populations such as older adults, pregnant or 
post-partum women, and patients with specific conditions such as coronary heart 
disease or HIV/AIDS (Lowe et al., 2010[13]). 


While self-report scales for depressive symptoms tend to be well validated, scales for 
anxiety disorders have been found to be somewhat less sensitive and specific in 
clinical populations. Research suggests this may be because different types of anxiety 
disorders have more heterogeneous symptoms than depressive disorders (Rose and 
Devine, 2014[43]). Despite this, both the GAD-7 and GAD-2 have been validated in a 
number of studies. The GAD-7 was designed to provide a brief clinical measure of 
generalised anxiety disorder, and its validation exercise found it to have good validity 
(criterion, construct, factorial, etc.). Furthermore, factorial validity assessments of the 
GAD-7 and PHQ-8 found that, despite a high correlation between the anxiety and 
depression scales (0.75), the two scales are complementary and not duplicative; more 
than half of patients with high levels of anxiety did not also have high levels of 
depression (Spitzer et al., 2006[11]). The high correlations of the GAD-7 with two 
other anxiety scales indicated good convergent validity (Kroenke et al., 

2007[44]; Spitzer et al., 2006[11]).4 Both the GAD-7 and the shorter GAD-2 perform 
well in detecting all four major forms of anxiety disorders: generalised anxiety 
disorder, panic disorder, social anxiety disorder and post-traumatic stress 

disorder (Kroenke et al., 2007[44]). 


The PHQ-4 has been found to be a valid tool for measuring the combined presence of 
risks for both depression and anxiety. As noted above, its component parts - the PHQ- 
2 and GAD-2 - have been validated against diagnostic criterion standard interviews 
(with caveats to the broader applicability of PHQ-2 criterion validity, as mentioned 
above). Studies have shown that PHQ-4 scores are associated with the SF-20 
functional status scale and health information such as disability days used, etc., 
providing evidence for convergent and construct validity. Furthermore, factorial 
analysis has found that the PHQ-4 has a two-dimensional structure with two discrete 
factors, picking up on both depression and anxiety disorders (Lowe et al., 2010[13]). 


Composite scales capturing aspects of positive mental health have also been found to 
have good validity. WEMWBS was found to have good criterion and convergent 
validity, being highly correlated with other scales that capture positive affect. 
WEMWBS and the WHO-5 are, unsurprisingly, highly correlated with one another 
(correlation coefficient of 0.77) (NHS Health Scotland, 2016[15]), with WEMWBS being 
Slightly less correlated with other measures of mental health that had a stronger focus 
on physical health or psychological distress (including the GHQ-12). Another study on 
WEMWBS found that the shorter version of the screening test was highly correlated 
with the longer version, making it an efficient and quicker alternative to the longer 14- 
question version (Stewart-Brown et al., 2009[45]). Despite its length, Rasch analysis 
has found that WEMWBS is unidimensional with one underlying factor (Stewart-Brown, 
2021[16]).5 Multiple studies have shown the MHC-SF has good convergent 

validity (Guo et al., 2015[46]; Petrillo et al., 2015[47]; Lamers et al., 2011[20]), 
however cognitive interviews in Denmark found that it had poor face validity, 
especially for questions on the social subscale (Santini et al., 2020[48]). 


Although designed as measures of positive mental health, both WEMWBS and the 
WHO-5 have been shown to be effective screeners for depression and/or anxiety. A 
study found the WHO-5 to have high sensitivity, but low specificity, in identifying 
patients with depression in a clinical setting (Topp et al., 2015[49]). A study of 
SWEMWEBS found it to be relatively highly correlated with the PHQ-9 (rho = 0.6-0.8) 
and the GAD-7 (rho = 0.6-0.7), suggesting that it is an acceptable tool for measuring 


common mental disorders (CMD); however, other tools may be more sensitive in 
identifying and distinguishing between individuals with worse levels of mental 
health (Shah et al., 2021[17]). A study comparing WEMWBS to the GHQ-12, through 
multidimensional item-response theory, found that both tools appear to measure the 
same underlying construct (BOhnke and Croudace, 2016[50]). 


Self-reported mental health (SRMH) indicators have been compared to validated 
clinical measures of mental health and have been shown to be related to, though 
distinct from, other mental health scales. SRMH is correlated with the Kessler scales, 
the PHQ and the mental health component of the SF-12 and is often used in the 
validation process of other mental health screening tools as a test for convergent 
validity. Furthermore, SRMH is associated with poor physical health and an increased 
use of health services. Although related, research has shown that correlations 
between SRMH and screening tools are moderate, suggesting that they are capturing 
Slightly different phenomena (Ahmad et al., 2014[51]). The authors note that further 
research is needed but suggest that findings from longitudinal studies of self- 
reported physica/ health could shed some light. SRMH measures were shown to be 
stronger predictors of mortality, morbidity and service use than other indicators, and 
that SRMH may be capturing mental health problems that do not yet manifest in 
screening tools (Ahmad et al., 2014[51]). Conversely, health-related quality of life 
(HRQoL) - which measures both physical and mental health - has been found to have 
construct and criterion validity that is good and comparable to the SF-36 

scale (Moriarty, Zack and Kobau, 2003[21]). 


Studies of mental health screening tools have yielded conflicting evidence as to 
whether single-item mental health questions are sufficiently valid. A study assessing 
the comparative performance of the MHI-5 and MHI-18 (which concluded in favour of 
the shorter version) found that even a single question - “how often were you feeling 
downhearted and blue?” - performed as well as the MHI-5, MHI-18 and GHQ-30 at 
detecting major depression (Berwick et al., 1991[38]). However, studies assessing 
ultra-short screening tools found that even two questions perform significantly better 
at screening for depression than does a single question (Lowe et al., 2010[13]). 
Conversely, the Australian Taking the Pulse of the Nation (TPPN) survey, administered 
throughout the COVID-19 pandemic, found that the psychometric properties of its 
single-item mental health measure compared favourably to the K6: the items were 
highly correlated (rho = 0.82), and the single-item measure had high sensitivity for 
psychological distress (Botha, Butterworth and Wilkins, 2021[52]). 


Box 3.5. Key messages: Validity 


° All of the mental health screening tools commonly used by OECD member 
states have been validated in clinical settings and found to have strong convergent, 
construct and criterion validity. 

° Composite scales for positive mental health have also been found to have 
strong psychometric properties, and they have proven effective as screeners for specific 
mental health conditions such as depression and/or anxiety. 

° Criterion validity is assessed by the survey tool’s performance in comparison to 
a clinical diagnostic interview gold standard; however, this presupposes the validity of 
clinical diagnoses, which may not hold in all contexts. 


What do non-response rates tell us about stigma? How does this affect 
the comparability of mental health data across groups? 


The stigma associated with mental illness may lead to misreporting - and under- 
reporting - of one’s mental health conditions (Hinshaw and Stier, 2008[53]). Low 
levels of mental health literacy can also lead to under-reporting, with individuals not 


recognising their own experienced symptoms as representative of an underlying 
condition (Tambling, D’Aniello and Russell, 2021[54]; Dunn et al., 2009[55]; Coles and 
Coleman, 2010[56]).6 Feelings of stigma towards mental health conditions remain 
important in all OECD countries, with large differences between them. A survey 
conducted in 2019 found that, in 19 OECD countries, 40% of respondents 

did not agree with the statement that mental illness is just like any other illness, and a 
quarter agreed that anyone with a history of mental disorders should be prevented 
from running for public office (OECD, 2021[57]). Because of stigma, respondents may 
either conceal their true conditions when answering mental health surveys or may 
choose not to participate in the first place. When administering surveys on sensitive 
subjects, providing clear assurances of data confidentiality and ensuring that the 
interview is conducted in a private place, out of hearing of family members, minimise 
the likelihood of respondent refusal (Singer, Von Thurn and Miller, 1995[58]; Krumpal, 
2013[59]).7z 


Evidence shows that those experiencing psychological distress or a specific mental 
disorder are more likely to refuse to participate in a survey; this non-response bias 
then leads to underestimates of the overall prevalence of mental ill-health (de Graaf 
et al., 2000[60]; Eaton et al., 1992[61]; Mostafa et al., 2021[62]). A recent study, 
which compared the effect of psychological distress on a number of economic 
transitions (e.g. falling into unemployment), using both the GHQ-12 score and a 
version of it adjusted for misreporting behaviour scores, showed that the original 
version of the GHQ-12 score underestimated the effect of psychological distress on 
transitions into better-paid jobs and higher educational attainment (Brown et al., 
2018[63]). Thus, misreporting of symptoms of psychological distress can lead to 
biased and inconsistent estimates. However, not all studies come to the same 
conclusion: the US National Comorbidity Survey Replication (NCS-R) study found no 
evidence of non-response leading to underestimates of disorder prevalence (Kessler 
et al., 2004[64]). 


Evidence from the 2013 European Union Statistics on Income and Living Standards 
(EU-SILC) survey shows that non-response rates for mental health questions are high 
(15%), but still comparable to those for other subjective variables (e.g. 13% for trust 
in politics, 8% for satisfaction with one’s job (Figure 3.1 Panel A). High non-response 
rates for mental health (as measured through the MHI-5) may partly reflect the way in 
which the EU-SILC survey is implemented. Each year, an ad-hoc module featuring 
additional questions on a specific topic is implemented in addition to the core module 
(in 2013 this module focused on well-being), implying that some respondents may 
have problems in answering questions that were not asked in previous waves. 


Missing and non-response rates for mental health variables vary widely between 
countries (Figure 3.1, Panel B). In the 2013 EU-SILC survey, missing rates for the 
mental health module were higher than 20% in Ireland, Poland, France, Portugal, the 
United Kingdom and Lithuania, but were below 1% in Norway, Belgium, Switzerland, 
Luxembourg and Austria. 


Figure 3.1. Non-response rates are higher for mental health questions 
than they are for other variables, and vary substantially across 
countries 


Panel A: Share of non-response rates for different types of variables, Europea 
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Panel B: Share of non-response rates for mental health questions (MHI-5), Euror 
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Note: This figure only includes individuals who have agreed to participate in the survey, 
and subsequently choose not to answer individual question items; it does not consider 
those who refuse to participate in the full survey. A respondent is deemed to be missing 
mental health data if they refused, or replied “do not know”, to at least four of the five 


individual items on the MHI-5. Refer to Annex 2.B for more information about specific 
tools. 

Source: OECD calculations based on the European Union Statistics on Income and Living 
Conditions (EU-SILC) (n.d.[65]) (database), https://ec.europa.eu/eurostat/web/microdata/ 
european-union-statistics-on-income-and-living-conditions. 

StatLink https://stat.link/s3v124 


Table 3.2 shows some suggestive evidence that differences in non-response rates by 
country could be related to levels of stigma; in nine European OECD countries, the 
prevalence of any depressive disorder (as measured by the PHQ-8) is inversely 
correlated with the prevalence of mental health stigma, as measured by the share of 
the population who agree that people with a history of mental health problems should 
be excluded from running for office. Therefore, in countries with more stigma, the 
prevalence of depressive disorder risk is also lower - perhaps because of reluctance to 
report.8 


Table 3.2. The relationship between stigma and prevalence is complex, but in 
some instances, stigma may lead to lower reported prevalence of mental 
disorders 


Correlations between indicators of stigma towards mental health and prevalence of 
mental health conditions, nine European OECD countries 
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Note: Table displays listwise correlations. The three stigma questions ask respondents to indicate 


the extent to which they agree with the statement. For the first, agreement entails stigma; for the 
second two, agreement entails the absence of stigma. For details on the MHI-5 and PHQ-8 
measures, see Annex 2.B. * Indicates that Pearson’s correlation coefficients are significant at the 
p<0.10 level, ** at the p<0.05 level, and *** at the p<0.01 level. N = 9 countries. 


Source: Stigma data originally come from an Ipsos survey, as published in OECD (2021[57]), Fitter 
Minds, Fitter Jobs: From Awareness to Change in Integrated Mental Health, Skills and Work Policies, 
Mental Health and Work, OECD Publishing, Paris, https://dx.doi.org/10.1787/a0815d0f-en; MHI-5 
data come from OECD calculations based on the 2018 European Union Statistics on Income and 
Living Conditions (EU-SILC) (n.d.[65]) (database), https://ec.europa.eu/eurostat/web/microdata/ 


european-union-statistics-on-income-and-living-conditions; PHQ-8 come from OECD calculations 
based on European Health Interview Survey (EHIS) wave 3 data (n.d. 


[66]) (database), https://ec.europa.eu/eurostat/statistics-explained/index.php? 
title=Glossary:European_ health interview survey (EHIS). 


In order to understand what bias is introduced by non-response rates, it is important 
to understand the profile of those who are choosing not to respond to mental health 
questions. Figure 3.2 shows these shares for a number of socio-demographic groups - 
gender, age cohort, education level, labour market status and income quintile. Panel A 
displays non-responses to mental health questions for 26 European OECD countries, 
while Panel B shows those for the United Kingdom. For both data sources, women, 
those with higher levels of education, and older age cohorts are more likely to answer 
mental health questions, while men, young people and those with lower levels of 
education are more likely to not respond. These results are in line with a report 
describing stigma towards mental health in Sweden: women were found to be more 
likely than men to have positive feelings towards those with mental health conditions, 
while young people were more likely than older people to report that it would be 
difficult to talk about their own mental illness with someone 

else (Folkhalsomyndigheten, 2022[67]). In European countries, there is a clear 
difference by income - those in the top income quintile are less likely to respond - 
however, this pattern does not hold for the United Kingdom. A study on non-response 
rates in longitudinal health surveys among the elderly in Australia found that those 
with lower occupational status and less education were less likely to 

participate (Jacomb et al., 2002[68]); however, neither risk for depression nor anxiety 
influenced refusal rates. The Netherlands Mental Health Survey and Incidence Study-2 
(MENESIS-2) found higher non-response rates among young adults, leading to under- 
reporting of specific mental disorders among this population (de Graaf, Have and Van 
Dorsselaer, 2010[69]). 


Figure 3.2. Young people, men and those with lower levels of education 
are less likely to respond to mental health questions 


Panel A: Share of non-response rates for mental health questions (MHI-5) across differen 
European OECD 26, 2013 
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Panel B: Share of non-response rates for mental health questions (GSWEWMWBS, GHQ- 
component), United Kingdom, 2019 
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Note: Refer to Annex 2.B for more information about specific tools. 


Source: Panel A: OECD calculations based on the 2013 European Union Statistics on 
Income and Living Conditions (EU-SILC) (n.d.[65]), 

(database), https://ec.europa.eu/eurostat/web/microdata/european-union-statistics-on- 
income-and-living-conditions; Panel B: OECD calculations based on University of Essex, 
Institute for Social and Economic Research (2022[70]), Understanding Society: Waves 1- 
11, 2009-2020 and Harmonised BHPS: Waves 1-18, 1991-2009 (database). 15th Edition. 
UK Data Service. SN: 6614, http://doi.org/10.5255/UKDA-SN-6614-16, from wave 10 only 
(Jan 2018 - May 2020). 

StatLink https://stat.link/Lim7az 


Data on previous diagnoses for, or experienced symptoms of, specific mental 
disorders are likely to under-report population prevalence due to a combination of 
reticence to disclose personal medical history and inability to afford or access 

care (Hinshaw and Stier, 2008[53]). Furthermore, prevalence of mental ill-health 
based on these data is heavily influenced by the characteristics of health care 
systems in different countries and regions, including their ability to treat and diagnose 
a wide range of patients. For example, data predating the pandemic show that 67% of 
working-age adults who wanted mental health services reported difficulty in accessing 
treatment (OECD, 2021[71]). A survey in Canada compared the self-reported use of 
mental health services from the Canadian Community Health Survey with health 
service administrative data from the government of Quebec (Régie de /’assurance 
maladie du Québec - RAMQ), reporting significant discrepancies: 75% of mental 
health service users in the RAMQ registry did not report using services in the CCHS, 
with these disparities being highest for older people, those with lower levels of 
education and mothers of young children (Drapeau, Boyer and Diallo, 2011[72]). 
Another study for Australia examined the extent of under-reporting of mental illness 
by matching self-reported mental health information (self-report diagnosis and self- 
reports of prescription drug use) to administrative records of filled prescriptions for 
mental disorders; the researchers found that survey respondents are significantly 
more likely to under-report mental illnesses compared to other health conditions 
because of stigma (Bharadwaj, Pai and Suziedelyte, 2017[73]). 


Box 3.6. Key messages: Non-response bias and missing values 


° Those with worse underlying mental health may be more likely to refuse to 
participate in surveys, thereby understating the actual prevalence of mental ill-health; 
however, the evidence is not conclusive. 

° There is conclusive evidence that self-reported data on previous diagnoses and 
experienced symptoms of specific mental ill-health conditions are significantly influenced 
by stigma and bias. 

° Analysis from European OECD countries shows that younger people, men and 
those with lower levels of education are more likely to refuse to answer questions on 
mental health. 


Are the reliability and validity of these measures consistent across 
cultures and socio-demographic groups? 


Governments tasked with promoting population mental health need high-quality 
information to understand inequalities in mental health outcomes and whether 
national trends (either improvements or deteriorations) are masking differences 
within groups, so that they can target policy interventions to those who are most in 
need. For these reasons, a mental health indicator needs to be able to compare age 
cohorts, genders, race and ethnic groups, different education and income levels and 
other socio-economic markers. Ensuring comparability, however, is not 
straightforward. Cultural differences in perceptions of mental health may make some 


groups less likely to answer (or honestly answer) questions surrounding mental 
health. These challenges are true for both inter- and intra-country comparisons.9 


Figure 3.3. Prevalence of psychological distress and depressive 
symptoms risk varies by as much as 100 percent across European 
OECD countries 
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Note: Panel A: risk for psychological distress is defined as having a score >= 52 ona scale 
from 0 (least distress) to 100 (most); Panel B: a respondent is deemed to be at risk for 
major depressive disorder if they answer “more than half the days” to either of the first 
two questions on the PHQ-8, and, in addition, if five or more of the eight items are 
reported as “more than half the days”. They are at risk for “other depressive disorders” if 
they answer “more than half the days” to either of the first two questions on the PHQ-8, 
and in addition, a total of two to four of the eight items are reported as “more than half 
the days” (Eurostat, n.d.[74]). Refer to Annex 2.B for more information on individual 
screening tools. 

Source: Panel A: OECD calculations based on the 2018 European Union Statistics on 
Income and Living Conditions (EU-SILC) (n.d.[65]), 

(database), https://ec.europa.eu/eurostat/web/microdata/european-union-statistics-on- 
income-and-living-conditions; Panel B: European Health Interview Survey (EHIS) wave 3 
data (n.d.[66]) (database), https://ec.europa.eu/eurostat/statistics-explained/index.php? 
title=Glossary:European_health_interview_survey (EHIS). 

StatLink https://stat.link/ocxvgt 


Data from European countries show large variations in the prevalence of psychological 
distress and depressive symptoms. The prevalence of psychological distress in 
Portugal, France and Lithuania is more than twice that of Ireland, Norway, Poland and 
Switzerland (Figure 3.3, Panel A). Similarly, the prevalence of depressive symptoms in 
France, Sweden and Germany is more than double that of Greece, the Slovak Republic 
and the Czech Republic, among others (Figure 3.3, Panel B). Yet how much of this is 
due to differences in the underlying mental health of each population, and how much 
is due to cultural differences leading to differential response behaviours for these 
screening tools? 


Some of these cross-country differences could stem from different levels of stigma 
towards mental health, with countries having lower overall prevalence levels also 
showing higher levels of stigma (refer to the previous section, and Table 3.2). 


Comparisons of the prevalence of mental ill-health can be difficult within countries, as 
well. Panel A of Figure 3.4 shows that women in 26 European OECD countries are 
more likely to report higher levels of psychological distress than men, at all stages of 
their life. Panel B of Figure 3.4 also suggests that white Americans have higher levels 
of psychological distress than other racial/ethnic groups, and that Asian-Americans 
have the lowest levels. Research has shown that there are systematic gender 
differences in self-report bias, as men tend to minimise their symptoms more than 
women do (Brown et al., 2018[63]). One study also found that men, but not women, 
reported fewer depressive symptoms when consent forms indicated that a more 
involved follow-up might occur (Sigmon et al., 2005[75]). A survey on attitudes 
towards mental health and stigma in Sweden found that women were more likely to 
report feeling positive attitudes towards those with mental health conditions than did 
men (Folkhalsomyndigheten, 2022[67]). Therefore how much of the visible difference 
is due to differences in reporting rather than differences in actual underlying mental 
health? 


Figure 3.4. Are differences in reported outcomes by gender and 
race/ethnicity due to differences in underlying mental health or to 
measurement issues? 
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Note: Scoring information for Panel A: risk for psychological distress is defined as having a 
score >= 52 ona scale from O (least distress) to 100 (most); Panel B: risk for 
psychological distress is defined as having a score >= 13 ona Scale from O (least 
distress) to 24 (most). Refer to Annex 2.B for more information on individual screening 
tools. 

Source: Panel A: OECD calculations based on the 2018 European Union Statistics on 
Income and Living Conditions (EU-SILC) (n.d.[65]), 

(database), https://ec.europa.eu/eurostat/web/microdata/european-union-statistics-on- 
income-and-living-conditions; Panel B : OECD calculations based on University of 
Michigan (2021[76]), Pane/ Study of Income 

Dynamics (database), https://psidonline.isr.umich.edu/default.aspx data from 2019 only. 
StatLink https://stat.link/3yntk5 


To answer these questions on measurement bias and cross-group comparability, 
researchers assess whether surveys have structural invariance by key socio- 
demographic characteristics in the process of clinically validating screening tools. 
Screening tools for symptoms of depression and anxiety, along with the WHO-5 and 
SWEMWEBS scales for positive mental health, are the tools that have been most 
frequently validated across numerous settings (e.g. gender, age cohorts and 
racial/ethnic groups). 


In terms of screening tools for specific mental ill-health conditions, both the PHQ-8 
and GAD-7 have been found to be free from basic gender and age biases. The PHQ-8 
and -9 have been validated across a number of clinical settings, with different age 
groups, cultures and racial/ethnic groups (El-Den et al., 2018[8]), and the PHQ-2 has 
been validated for use in youth and adolescents (Richardson et al., 2010[77]). A study 
of the PHQ-4, limited to the United States, found no structural invariance by gender 
and age: its findings may extend to other countries with similar population structures, 
but not necessarily to others with different population structures (Sunderland et al., 
2019[33)]). 


U 


Ott 


Positive mental health composite scales also perform strongly on age and gender 
generalisability. The WHO-5 has been shown to have good construct validity for all 
age groups and has been deemed suitable for children aged 9 and above (Topp et al., 
2015[49]). The MHC-SF performs well across sex, age cohorts and education 

levels (Santini et al., 2020[48]). WEMWBS was originally validated for an adult 
population but has since been validated for use in youth aged 11 and above (Warwick 
Medical School, 2021[78]). In the course of validating the 14-item version of 
WEMWBS, researchers found evidence that two items showed bias for gender; for 
example, for any level of mental well-being, men were more likely than women to 
answer positively for the item “I’m feeling more confident” (Stewart-Brown et al., 
2009[45]). These two items were removed in the process of creating the 7-item 
version of the screening tool (GSWEMWBS). This short form displays no response rate 
differences by gender, marital status or household income (Tennant et al., 2007[14]). 


Evidence for validity across racial groups is more mixed for all tools, and much of the 
evidence comes from either the United States or Canada. There is mixed support for 
cross-cultural invariance of the CES-D’s factor structure across Latino and Anglo- 
American populations (Crockett et al., 2005[79]; Posner et al., 2001[80]); one study 
found that item-level modifications were needed for the CES-D when administered to 
older Hispanic/Latino and Black respondents (El-Den et al., 2018[8]). Other studies 
also indicate that Asian-American and Armenian-American populations have a 
different factor structure, higher depressive symptoms, and a tendency to over- 
endorse positive affect items, in comparison to Anglo-Americans (Iwata and Buka, 
2002[81]; Demirchyan, Petrosyan and Thompson, 2011[82]). Research in the United 
Kingdom implementing the GHQ-12 across diverse racial and ethnic groups found 
some suggestive evidence of differences by group, requiring further study (Bowe, 
2017[83]). A study comparing Korean-American and Anglo-American older adults 
found that cross-cultural factors may significantly influence the diagnostic accuracy of 
depression scales and potentially result in the use of different cut-off scores for 
different populations (Lee et al., 2010[84]). Another study revealed that Black/African- 
American participants with high GAD symptoms scored lower on the GAD-7 than other 
participants with similar GAD symptoms (Parkerson et al., 2015[85]; Sunderland et al., 
2019[33)). 


Measures of self-reported mental health may also be susceptible to bias by 
racial/ethnic identity. US studies have found that ethnicity appears to moderate the 
relationship between SRMH and a range of mental health conditions. For example, 
Black and Hispanic/Latino Americans are more likely to report excellent SRMH than 
white Americans and show a weaker association between SRMH and service use. A 
study in Canada found that Asian identity was associated with worse SRMH even after 
controlling for socio-economic status (Ahmad et al., 2014[51)). 


Many screening tools have been translated into multiple languages and used in 
surveys across the globe. The WHO-5, K10, MHI-5, GAD-7 and WEMWBS have all been 
translated into a number of languages (Sunderland et al., 2019[33]); the WHO-5, for 
example, has been translated into more than 30 languages and implemented in 
surveys in six continents (Topp et al., 2015[49]).10 WEMWBS has been used across 50 
countries and translated into 36 languages (Stewart-Brown, 2021[16]; Warwick 
Medical School, 2021[78]). Psychometric evaluations for the MHC-SF have also been 
conducted in many countries (Petrillo et al., 2015[47]; Joshanloo et al., 2013[86]; Guo 
et al., 2015[46]); however, cross-country comparisons in rates of flourishing show a 
high degree of variability, some of which may be driven by measurement issues 
rather than differences in latent mental health (Santini et al., 2020[48]). 


Cultural differences pose significant challenges in establishing uniform definitions and 
descriptions of mental health and threaten cross-country comparisons (see Box 3.4). 


Cross-cultural validation refers to whether mental health measures that were 
originally generated in a given culture are applicable, meaningful and thus equivalent 
in another culture (Huang and Wong, 2014[87]). Most widely used mental health 
scales have been developed and validated in high-income, Western and English- 
speaking populations (e.g. North America, Europe, Australia) and therefore assume a 
Western understanding of mental disorders and symptoms (Sunderland et al., 
2019[33]). This can raise questions as to their applicability to other population groups. 
For example, a review of 183 published studies on the mental health status of 
refugees reported that 78% of the findings were based on instruments that were not 
developed or tested specifically in refugee populations (Hollifield et al., 2002[88]). 


Evidence on cross-cultural validation for different tools is mixed. WEMWBS has been 
validated in 17 different languages and local populations as well as for minority 
populations within the United Kingdom (Warwick Medical School, 2021[78]). Although 
the PHQ has been validated in many settings and is considered to be one of the more 
robust screening tools, one study found that, when applied in middle- or low-income 
countries, it performed well only in student samples and not in clinical samples, 
leading researchers to suggest that it should be used only in settings with relatively 
high rates of literacy (El-Den et al., 2018[8]; Ali, Ryan and De Silva, 2016[32]). 
Similarly, scoring schemes - i.e. the process of determining what score on a screening 
tool designates risk for a specific mental issue - are often calibrated based on the US 
general population, where the initial clinical study took place. The scoring scheme of 
the Kessler scales was designed to seek out maximum precision in the 90* - 

99" percentile of the general population distribution, because of US epidemiological 
evidence that, in any given year, between 6% and 10% of the US population meet the 
definition of having a serious mental illness. Therefore, these scoring schemes may 
not be appropriate for other populations with different structures (Kessler et al., 
2002[89]). As another example, the mental health component of the SF-12 is typically 
scored using US-derived item weights for each response category (Ware et al., 
2002[90]). International comparisons have been done in Europe and Australia, which 
have found these weights to be appropriate (Vilagut et al., 2013[91]), but this does 
not necessarily extend to other regions. 


Research is clearly needed on culturally specific mental health scales developed using 
a bottom-up and open-ended approach, or with a greater degree of local adaptation, 
and with further testing of existing scales across different cultures and 

ethnicities (Sunderland et al., 2019[33]). Furthermore, advances in psychometric 
models and computational statistics have led to new developments in the 
administration and scoring of screening tools, which can facilitate cross-cultural 
analyses.11 Yet it is important to contextualise the magnitude of these differences. 
Research using data from the Gallup World Poll covering 150 countries on cross- 
country differences in measures of positive mental health, including life satisfaction, 
has found that cultural differences account for only 20% of inter-country variation in 
outcomes. This 20% includes both the impact of different cultures on outcomes, as 
well as potential measurement bias, an amount that is small in comparison to the 
impact of objective conditions - such as income, education and employment (Exton, 
Smith and Vandendriessche, 2015[92]; OECD, 2013[1]). The impact that these 
objective life conditions have on mental health is also likely to be larger than that of 
cultural bias. This does not negate the importance of better designing and validating 
mental health tools across populations, but it does provide a needed reminder that 
mental health indicators are informative and useful for policy. 


Box 3.7. Key messages: Accuracy across groups 


Differences in attitudes toward mental health can lead to differential reporting 


across countries, as well as by gender, age and racial/ethnic identity. 


° Surveys on stigma and discriminatory views have shown differences in attitudes 
toward mental health by age and gender. 


° In the process of validation, screening tools are tested for biases by age, gender 
and racial/ethnic group. While most screening tools for specific mental ill-health 
conditions and most composite scales for positive mental health perform well for age and 
gender, evidence for race/ethnicity is mixed. More research is needed on the performance 
of self-reported general mental health questions. 


° Survey items must be validated in local populations to ensure their suitability. 
Validation studies conducted in one geographic area, or in one population group, may not 
be applicable to other contexts. 


How comparable are measures over time? 


A key goal of policy makers is to understand trends over time. Is population mental 
health improving or deteriorating? Do policy interventions lead to visible changes in 
mental health outcomes? It is therefore important that the accuracy of chosen 
indicators hold not only cross-sectionally but also over time. There are two 
complications in measuring mental health over time: (1) behavioural and attitudinal 
changes towards mental health, leading to different response behaviour; and (2) the 
fact that many of the screening tools have been validated against clinical diagnoses in 
cross-sectional studies, which may not provide sufficient evidence that they are 
sensitive to changes over time. 


Attitudes towards mental health have changed over the years, and while stigma and 
bias remain, progress in reducing them has been made. In recent years, governments 
across the OECD have pursued public information campaigns, especially centred in 
schools and educational institutions, to destigmatise mental illness. Even before the 
COVID-19 pandemic, 12 OECD countries waged national campaigns to improve mental 
health literacy, and five had regional or local campaigns (OECD, 2021[71]). Initial 
evidence of the impact is mixed: while some studies show little to no decline in stigma 
to mental health conditions, especially in the long run (Deady et al., 2020[93]; Walsh 
and Foster, 2021[94]), others point to an increase in service use, such as visits to 
psychiatric emergency departments (Cheng et al., 2016[95]). A study in the United 
Kingdom found that exposure to mental health campaigns may have led to an 
increase in these symptoms among young people; the research suggests that this was 
not because of a newfound awareness of pre-existing feelings, but a causal result of 
increased information about mental illness (Harvey, n.d.[96]). Other early research in 
this vein posits that awareness campaigns may lead to individuals categorizing their 
feelings and emotions - which may be mild or moderate - as more concerning 
indications of mental distress, which may then change their own perceptions and 
behaviours, thus leading to actual worsening of symptoms (Foulkes and Andrews, 
2023[97)]). 


If anti-stigma campaigns are indeed having their intended impact, then general 
population attitudes toward poor mental health may be changing, and the average 
person may feel more comfortable speaking openly and honestly about their mental 
health. This could distort estimates of mental ill-health prevalence over time. If the 
general population ten years ago felt less comfortable honestly answering questions 
on how often they felt “down, depressed or hopeless” over the past two weeks, one 
might expect higher rates of non-response, or of respondents lying about their true 
feelings, than today; as a result, one would expect to see the reported prevalence of 
psychological distress /ncrease just because of this change in attitudes. 


Box 3.8. Changes in mental health during the pandemic 


During and in the wake of the COVID-19 pandemic, mental health deteriorated in most 
OECD countries, with rates of symptoms of depression and anxiety doubling in 

some (OECD, 2021[98]; OECD, 2021[57]). Indeed, for eight European OECD countries 
that have comparable pre-pandemic baseline data, the share of the population at risk 
for depression rose substantially, and by more than 20 percentage points in Italy and 
France (Figure 3.5; (OECD, 2021[99])). A study looking at data from January 2020 to 
January 2021 estimated that the share of people experiencing symptoms of anxiety 
and depression were 28% and 26% higher, respectively, in 2020 than they would have 
been had the pandemic not occurred (OECD, 2021[57]).12 Both longitudinal and 
cross-sectional studies in European countries have found that positive mental health - 
measured through the WHO-5 (an affect-based measure), and SWEMWBS or the MHC- 
SF (combining aspects of affect, eudaimonia, social connections and life evaluation) - 
significantly deteriorated over the course of the pandemic (Thygesen et al., 
2021[100]; Vistisen et al., 2022[101]; Eurofound, 2021[102]; Vinko et al., 2022[103]). 


While the increase in prevalence of symptoms for mental ill-health is more or less 
agreed upon, it remains to be seen whether this increase is temporary, or whether 
mental health will revert to pre-pandemic levels relatively quickly. As of mid-2021, 
overall mental health had not recovered to pre-pandemic levels; however, there were 
suggestions of recovery in some OECD member states (OECD, 2021[57]; OECD, 
2021[99]). Even still, certain population groups who were particularly negatively 
affected, such as young people, continue to face many challenges (OECD, 2021[99]). 


Figure 3.5. Symptoms of depression rose substantially in eight 
European OECD countries in the first year of the pandemic 
Share of respondents at risk of depression, 2020 and 2021 vs. 2014 
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Note: Data from 2020 and 2021 come from a different data source than do data from 
2014, meaning that caution should be taken in interpreting numerical increases in any 
individual country. Both data sources use the PHQ-2 as a measure for depression risk. 
Data for 2020 and 2021 come from the YouGov COVID-19 behaviour tracker: 2020 pooled 
averages run from April through December, and 2021 pooled averages run from January 
through June. Baseline data come from the European Health Interview Survey (EHIS) wave 
2 in 2014. Refer to Annex 2.B for more information on individual screening tools. 


Source: OECD (2021[99]), COV/D-10 and Well-being: Life in the Pandemic, OECD 
Publishing, Paris, https://dx.doi.org/10.1787/1lelecb53-en. 
StatLink https://stat.link/szdacx 


Figure 3.6 provides some evidence disproving this hypothesis. Pre-pandemic data 
from over 20 European OECD countries and the United States show either improving 
or stable mental health in the years following the financial crisis and preceding COVID- 
19 (prevalence of symptoms of anxiety and depression rose dramatically in 2020 at 
the onset of the pandemic, see Box 3.8). Prevalence of psychological distress in the 
United States from 2009 to 2019 (measured bi-annually using the K6 screening tool) 
remained broadly stable over the decade, hovering around 4% (Panel A). Although not 
controlling for any socio-demographic factors, this suggests that concerns surrounding 
changing perceptions leading to large changes in response behaviour resulting in 
higher prevalence rates may not hold. That said, there is some evidence of higher, 
and potentially rising, prevalence among young people aged 16 to 24, which may 
reflect a combination of changing circumstances (socio-political, economic, climate- 
related), changing attitudes among young people toward mental health (making them 
more willing to speak openly to an enumerator), and smaller sample sizes in this 
cohort (leading to more noise in the data). Across 26 European OECD countries, 
psychological distress decreased between 2013 and 2018, which would not be 
expected if changes in behaviours made respondents more likely to speak honestly 
about their poor mental health (Panel C); a similar story is shown in Panel D, which 
shows psychological flourishing in 24 European OECD countries rose between 2011 
and 2016. Conversely, data from the United Kingdom (Panel B) show a deterioration of 
population mental health (as measured by the mental health component of the SF- 
12), while both their positive mental health (SWEMWBS) and the share at risk fora 
common mental disorder (GHQ-12) remained more or less stable. 


One possible reason why some population surveys may show relatively stable mental 
health prevalence over time could be that those measures lack sensitivity to change. 
Many mental health screening tools were validated in cross-sectional clinical samples; 
researchers therefore caution that they have not been tested for sensitivity to 
changes over time, which can only be assessed with longitudinal data (Ahmad et al., 
2014[51]; Tennant et al., 2007[14]; Moriarty, Zack and Kobau, 2003[21]; Spitzer et al., 
2006[11]). However there are exceptions. Some studies have found that the PHQ-8 
and PHQ-2 are sensitive to change over time (Lowe et al., 2010[13]). In terms of 
positive mental health, both WEMWBS and the shorter SWEMWBS have been shown to 
be sensitive to change over time for both groups and individuals: researchers suggest 
that +/- 3 points on the WEMWEBS scale, and +/- 1 to 3 points on the SWEMWBS scale, 
indicate a significant change (NHS Health Scotland, 2016[15]; Shah et al., 2018[104]). 


One approach to identifying measurement bias that is driven by changes in 
individuals’ characteristics or circumstances over time is to use longitudinal data. A 
study by Ploubidis et al. (2019[105]) used two nationally representative surveys in the 
United Kingdom to track age cohorts over two decades. Using a generalised latent 
variable measurement modelling framework, researchers tested whether 
respondents’ answers to questions on the Malaise Inventory (a 9-item survey 
measuring psychological distress) were affected by the passage of time and found 
little evidence for the presence of bias in the form of age effects, survey design, 
period effects or cohort-specific effects. 


Figure 3.6. Until 2019, mental health improved somewhat in European 
OECD countries, and remained roughly stable in the United States, 
despite greater mental health awareness 
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Panel C: Share of the population at risk for mental distress (MHI-5), 
European OECD 26, 2013 vs. 2018 
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Note: Panel A: Scoring information for each screening tools; risk for psychological distress 
if score is >= 13 ona scale from O (least distress) to 24 (most). Panel B: at risk for a 
mental condition if score is <= 50 on the transformed SF-12 mental health component 
composite scale, 0 indicates worst mental health and 100 best possible mental health; 
risk for a probable common mental disorder (CMD) if score is >= 4 on the GHQ-12, as 


used in (Woodhead et al., 2012[106]); good mental health is defined as having a 
SWEMWEBS score more than one standard deviation above the sample average. Panel C: 
risk for psychological distress if score is >= 52 on a scale from 0 (least distress) to 100 
(most); Panel D: psychological flourishing if score is >= 14 ona scale from O (worst 
mental health outcome) to 24 (best). Refer to Annex 2.B for more information. 

Source: Panel A: OECD calculations based on University of Michigan (2021[76]), Pane/ 
Study of Income Dynamics (database), https://psidonline.isr.umich.edu/default.aspx; 
Panel B: OECD calculations based on University of Essex, Institute for Social and Economic 
Research (2022[70]), Understanding Society: Waves 1-11, 2009-2020 and Harmonised 
BHPS: Waves 1-18, 1991-2009 (database). 15" Edition. UK Data Service. SN: 

6614, http://doi.org/10.5255/UKDA-SN-6614-16; Panel C: OECD calculations based on the 
2013 and 2018 European Union Statistics on Income and Living Conditions (EU-SILC) (n.d. 
[65]), (database), https://ec.europa.eu/eurostat/web/microdata/european-union-statistics- 
on-income-and-living-conditions; Panel D: OECD calculations based on the 2011 and 
2016 European Quality of Life Survey (n.d.[107]), 

(database), https://www.eurofound.europa.eu/surveys/european-quality-of-life-surveys. 
StatLink https://stat.link/y5vgfp 

Box 3.9. Key messages: Accuracy over time 


° Changes in attitudes towards mental health conditions, and comfort discussing 
these topics openly, may lead to measurement bias when comparing prevalence rates 
over time. 


° While evidence from impact assessments of anti-stigma campaigns is scarce, 
pre-pandemic evidence from cross-country trend data does not show the clear increase in 
reported psychological distress that lower stigma would imply. 


° Some mental health screening tools - including the PHQ-8 and (S)WEMWBS - 
have been found to be sensitive to change over time in longitudinal studies, but other 
tools have not been subject to sensitivity analysis. 

° More research into the presence of bias in the form of age, period or cohort- 
specific effects for mental health outcomes should be done. 


Data collection 


The practicalities of data collection can have an important impact on respondent 
behaviour, affecting the comfort and ease with which they interact with an 
enumerator and thus answer questions. Whether questions are framed as positive or 
negative statements, the way in which data are collected (enumerator- vs. self- 
administered) shapes the quality of the eventual output. Because of the sensitive 
nature of mental health questions, especially for screening tools that deal with suicide 
and suicidal ideation, additional protocols should be put in place to ensure both 
respondent and enumerator safety and well-being. 


How does question wording affect respondents’ attitudes and response 
behaviour? 


The order in which questions are asked, and the way in which questions are framed, 
may prime respondents to answer in a certain way. OECD research into subjective 
well-being shows that the influence of question ordering on life evaluation and affect 
questions can be significant; because of this, subjective well-being questions should 
be placed early on in surveys to minimise interference from other modules (OECD, 
2013(1)): 


For mental health questions, there is some evidence suggesting that questions may 

be upsetting to respondents, raising ethical concerns. Some studies have shown that 
participating in a survey with distressing questions, or answering questions focusing 

on distressing life events, can increase respondents’ stress and worsen their 


mood (Labott et al., 2013[108]), especially among populations already at risk for 
psychological distress. However, other research into the impacts of mental health 
surveys on the mood of respondents has not found evidence of significant 

effects (Jorm et al., 1994[109]; Jacomb et al., 1999[110]). The small portion of 
interviewees who did report feeling distress were more likely to be young women and 
people lacking social support (Jacomb et al., 1999[110]). 


Within a given mental health screening tool or composite scale, framing questions in a 
positive or negative light can impact on responses. Some tools use only negative 
question framing (e.g. PHQ-8, CES-D, K6), some only positive (i.e. (S)WEMWBS, WHO- 
5), and some employ a mix of the two (e.g. GHQ-12, MHI-5). A negatively framed 
question might ask, for example, how often someone felt downhearted and 
depressed, whereas a positively framed question might ask how often someone felt 
cheery and light-hearted. A respondent may feel more comfortable answering that 
s/he “rarely” felt cheery, rather than answering that s/he “always” felt depressed. This 
point is illustrated in Table 3.3, which relies on data from the UKHLS survey. The same 
sample of respondents were asked questions from three different mental health 
screening surveys. There are overlaps in the types and content of questions asked; 
pairs of questions are showcased in the top portion of the table. The correlation in 
responses are highest for questions that appear in tools with similar tone framing, 
either positive or negative (e.g. feeling downhearted and depressed from the mental 
health component of the SF-12 vs. feeling unhappy and depressed from the GHQ-12), 
and lowest for items that come from tools that use different framing (e.g. been able to 
face up to problems from the GHQ-12 and dealing with problems well from 
SWEMWBS). 


Users of mental health services in the United Kingdom have expressed a preference 
for survey tools that focus on positive, rather than negative, emotions (Stewart- 
Brown, 2021[16]). A study conducted with users there found that respondents found it 
“upsetting” to be asked a series of negative items in mental health questionnaires, 
and they expressed a preference for questions - WEMWEBS, specifically - that focus 
on aspects of good mental health (Crawford et al., 2011[111]). 


Table 3.3. The correlation of answers to similar questions depends on whether 
the phrasing is positive or negative 


Correlation between similarly worded questions on different mental health screening tools 


Question phrasing 


GHQ-12 SF-12 SWEMWBS 
Been able to face up to your Been dealing with problems 
problems well 
Been feeling unhappy and Felt downhearted and depressed 
depressed 
Felt calm and peaceful Been feeling relaxed 


Answer correlations 


Feeling unhappy/depressed Facing up to problems (GHQ-12 and Feeling relaxed 
SWEMWBS): 


(GHQ-12 and SF-12): 0.36 (SF-12 and SWEMWBS): 
0.67 0.56 


Note: Correlations show the pairwise Pearson correlation coefficient between similarly phrased 
questions appearing on different mental health screening tools or scales from the same 
longitudinal survey. Refer to Annex 2.B for more information about specific tools. 


Source: OECD calculations based on University of Essex, Institute for Social and Economic 
Research (2022[70]), Understanding Society: Waves 1-11, 2009-2020 and Harmonised BHPS: 
Waves 1-18, 1991-2009 (database). 15" Edition. UK Data Service. SN: 

6614, http://doi.org/10.5255/UKDA-SN-6614-16, from wave 10 only (Jan 2018 - May 2020). 


Box 3.10. Key messages: Question framing 


° OECD research has shown that the ordering of subjective questions in 
household surveys can influence responses, therefore consistency in ordering across 
surveys is important, and whenever possible these questions should appear early in 
surveys. 


° Evidence from the United Kingdom’s Understanding Society survey shows that 
whether a concept is framed in a negative or a positive light can lead to different 
responses. 


° Some users of mental health services have expressed a preference for survey 
tools that focus on positive rather than negative emotions. 


Does the survey mode affect respondents’ answers? 


Survey modes, i.e. the way in which data are collected from respondents, can 
influence how respondents process and reply to questions, as well as how much 
information they feel comfortable revealing. One of the main drivers of differential 
responses based on survey mode is social desirability bias: the tendency to present 
oneself in a favourable light and/or provide responses that conform to prevailing 
social norms. Social desirability has two components: impression management and 
self-deception (Paulhus, 1984[112]). Research has shown that interview subjects 
under-report taboo topics and over-report socially desirable actions (Krumpal, 
2013[59]; Presser and Stinson, 1998[113]). Social desirability bias can present itself in 
different ways, depending on the way in which data are collected - by an interviewer 
or self-administered, in person, or over the phone or Internet. 


Research has found that respondents are more likely to report better physical and 
mental health outcomes in interviewer-administered surveys as compared to self- 
administered surveys. Research has also shown that self-administered survey 
respondents are less likely to report stigmatised medical conditions, including anxiety 
and mood disorders (Krumpal, 2013[59]; Latkin et al., 2017[114]). A study in Norway 
found that respondents were more likely to report symptoms of anxiety and 
depression in self-administered surveys as compared to interviewer-administered 
(either in person or over the phone) surveys (Moum, 1998[115]): the presence of 
social desirability bias was particularly strong for young, well-educated respondents. 
Another study comparing computer-assisted self-interviewing (ACASI) with 
interviewer-administered paper-and-pencil (I-PAPI) surveys concluded that 
respondents were more likely to report mental health symptoms - as measured by the 
WHO-CIDI - in the self-administered survey than in the interviewer-administered 

one (Epstein, Barker and Kroutil, 2001[116]), with large differences for major 
depressive episodes and generalised anxiety disorder.13 


When surveys are administered by an interviewer, evidence is inconsistent as to 
whether respondents report better mental health outcomes face-to-face than over the 
phone or Internet. Evidence from the Canadian Community Health Survey (CCHS) 


suggests that while some physical health indicators are subject to mode effects, 
mental health outcomes are not14 (St-Pierre and Béland, 2004[117]). A study in the 
United States found the opposite impact, with respondents exhibiting stronger social 
desirability behaviour in telephone interviews than in person (Holbrook, Green and 
Krosnick, 2003[118]). Finally, in comparing proctored web-based assessments to 
paper-and-pencil administration modes, a recent meta-analytic review reported that 
web-based surveys do not offer an advantage regarding socially desirability in self- 
report questionnaires, and that the mode of administration does not affect reporting 
of mental health symptoms (Gnambs and Kaspar, 2017[119]). 


Mode effects in interviewer-administered surveys are illustrated by Figure 3.7. The 
figure shows the share of the population with a negative affect balance - defined as 
reporting to have experienced more negative, rather than positive, emotions on the 
previous day - from Gallup World Poll data. Gallup conducts annual surveys in over 
150 countries, including all 38 OECD countries. Data are collected via telephone 
surveys in many OECD countries; however, face-to-face interviews are common in 
many places in Latin America, the Middle East, Asia, Africa and former Soviet 
countries.15 In a handful of countries, the survey mode changed over the past 
decade, switching from face-to-face to telephone survey administration and vice versa 
(indeed, some countries - such as Iraq, Turkiye and Malaysia - have switched multiple 
times). Figure 3.7 shows that negative affect balance rose (meaning worsening 
mental health) in eight of 11 countries after switching from in-person to telephone 
interviews (Panel A); similarly, negative affect balance improved in all four countries 
after switching from telephone to in-person (Panel B). This is in line with the findings 
of (Holbrook, Green and Krosnick, 2003[118]), that respondents may be more 
influenced by social desirability bias when speaking to an interviewer over the phone, 
and thus over-report good well-being in telephone surveys.16 


Figure 3.7. Shifts from in-person to telephone-administered surveys are 
associated with deteriorations in negative affect balance 
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three years following a mode change); exceptions are made for countries that do not 
have sufficient years of data collection on either side of a mode switch. In those 
instances, one- or two-year averages are shown instead. IRQ and ARE did not collect 
negative affect data in 2013, the year of mode change. 

Source: OECD calculations based on the Ga//up World Poll (n.d. 

[120]) (database), https://www.gallup.com/178667/gallup-world-poll-work.aspx. 
StatLink https://stat.link/ngkd9r 

Box 3.11. Key messages: Mode effects 


° Respondents are more likely to report worse mental health outcomes when 
surveys are self-administered, as compared to interviewer-administered. 

° When Surveys are interviewer-administered, there is conflicting evidence as to 
whether mental health outcomes are subject to mode effects. 


° Consistency in mode is encouraged; when the survey mode changes between 
data collection rounds, this should be explicitly stated. 


What additional protocols or procedures should data collectors take on 
board for mental health modules? 


Interviewer training is crucial to the quality of responses in any survey. However, the 
measurement of mental health raises additional issues, because of the sensitive 
nature of the subject matter. Although a body of trained interviewers will generally 
contribute to higher response rates and better responses, interviewers may struggle 
to garner responses to questions if they cannot explain adequately to respondents 
why collecting such information is important and how it will be used. In some cases, 
respondents may fail to understand why a public agency might want to collect this 
type of information. To manage risks around respondent attitudes to questions on 
mental health, it is imperative that interviewers are well-briefed, not just on what 
concepts the questions are trying to measure, but also on how the information 
collected will be used. This is essential in order for interviewers to build a strong 
rapport with respondents, which can help to improve response rates along with the 
quality of those responses. 


A respondent’s relationship with the interviewer matters. In a study conducted with a 
group of mental health service users in a clinical setting, respondents emphasised 
that a questionnaire was “only as good as the doctor who uses it” (Crawford et al., 
2011[111]). In fact, users stated that the interviewer mattered most - more than 
either the content or length of the survey. A study in the United States found that 
respondents were more likely to disclose sensitive information about illegal drug use 
in a face-to-face interview, as opposed to over the phone, with the difference more 
pronounced for Black compared to white Americans (Aquilino, 1994[121]). A 
Norwegian study found minimal impact of interviewer gender and age on reported 
mental health symptoms, but noted that young male interviewers received fewer 
reports of symptoms as compared to interviewers of other ages and/or female 
interviewers (Moum, 1998[115]). This can function in the opposite direction as well, 
with a strong interviewer-respondent bond leading to more information being 
disclosed.17 


Recent research has shed more light on the need to involve those with lived 
experience in the survey design and data collection process, by building a pipeline of 
researchers with psychiatric disabilities and/or lived experience of mental health 
conditions (Jones et al., 2021[122]; Banfield et al., 2018[123]; Hancock et al., 
2012[124]). There is a strong basis of evidence showing that peer-interviewing 
techniques - drawing enumerators from the same community as interviewees - can 
be an effective way of improving trust between interviewer and interviewee, helping 
to collect high-quality data for hard-to-reach population groups (Dewa et al., 


2021[125]; Warr, Mann and Tacticos, 2011[126]; Hancock et al., 2012[124]). 
Furthermore, in the mental health context, the involvement in research of those with 
lived mental health experience can improve both the credibility of findings and the 
likelihood of their adoption into policy (Scholz et al., 2021[127]). 


Questions on suicide or suicidal ideation require careful consideration and well- 
designed procedures to provide needed support to respondents who are at 

risk (Lakeman and FitzGerald, 2009[128]). The final item of the PHQ-9 asks about 
suicidal thoughts and ideation, and for this reason it is often excluded from population 
surveys. In both the European Health Interview Survey (EHIS) and the United States’ 
National Health Interview Survey (NHIS), the PHQ-8 is used instead, for precisely this 
reason. 


For countries that do include questions on the topic of suicide, additional protocols are 
often employed. For example, the Australian non-suicidal self-injury (NSSI) prevalence 
study dealt with the sensitive nature of the survey questions by sharing in advance a 
large amount of information with the households to be interviewed; this helped to 
alleviate ethical considerations, and did lead to lower non-response rates (Taylor 

et al., 2011[129]). In implementing the Mental Health and Access to Care Survey 
(MHACS), the Canadian government provides mental health resources to both 
respondents and interviewers; enumerators are also provided with employee support 
services to help them navigate stress or ill effects to their own mental health that may 
be induced by administering the questionnaire (response to an OECD questionnaire, 
2022). 


Box 3.12. Key messages: Interviewer training 


° Respondents are more likely to participate in an interview, and answer 
truthfully, if they feel comfortable with the interviewer. Enumerator training should focus 
on building rapport and trust with respondents. 

° Careful procedures and support practices must be in place if surveys are to 
include questions surrounding suicide and suicidal ideation; in the case of household 
surveys, it may be best practice to avoid these types of questions on ethical grounds. 


Analysis 


Once mental health data are collected, as accurately and consistently as possible, 
they are only useful for policy makers when used in analysis. Many of the data 
described in this report are not binary, meaning the outcome variable is measured on 
a scale. This is true for the screening tools and composite scales, which contain a 
number of items, coded and scored accordingly, ranging from worst to best possible 
mental health. General mental health status tools are typically single questions; 
however, answer options are not binary and typically use a Likert scale (the exact 
number and phrasing of answer options varies across scales, see Table 2.11). 


What are the trade-offs between using cut-off scores vs. continuous 
measures? 


Having a continuous outcome measure for mental health has many benefits, including 
that it provides more detailed and nuanced information about population mental 
health, Further, when the distribution of responses is normal, without floor or ceiling 
effects,18 this allows for parametric analysis of outcomes, which enables researchers 
to better analyse the impacts of given policies or interventions. For example, research 
into screening tools for positive mental health has shown that data sourced from 
(S)WEMWBS have a distribution that more closely approximates a normal distribution 
than do screening tools that focus on specific mental illnesses (Shah et al., 


2021[17]).19 Indeed, this can be seen visually in Figure 3.8, which shows density plots 
for the three mental health tools included in the tenth wave of the UKHLS survey: 
SWEMWEBS; the mental health component of the SF-12 (MHC-12); and the GHQ-12. Of 
the three, SWEMWBS most closely approximates a normal distribution, followed by the 
SF-12, while the GHQ-12 shows significant floor effects. Separate research into the 
MHI-5 has shown that it is positively skewed; it is better able to distinguish between 
those with worse mental health than between those with higher levels of positive 
mental health (Elovanio et al., 2020[6]; Thorsen et al., 2013[130]). 


Figure 3.8. Positive mental health scales may better approximate a 
normal distribution 
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Note: Density plots showing the weighted scores for: Panel A: SWEMWBS (ranging from 
9.5 as worse mental well-being, and 35 as better mental well-being); Panel B: the mental 
health components of the SF-12 (ranging from 0, low functioning, to 100, high 
functioning); and Panel C: the GHQ-12 (ranging from O, better mental health, to 12, worse 


mental health) is lowest. Data for SF-12 and the GHQ-12 are from waves 2 to 10 

of Understanding Society; data for SWEMWBS come from waves 4, 7 and 10. Refer to 
Annex 2.B for more information on individual screening tools. 

Source: OECD calculations based on University of Essex, Institute for Social and Economic 
Research (2022[70]), Understanding Society: Waves 1-11, 2009-2020 and Harmonised 
BHPS: Waves 1-18, 1991-2009 (database). 15th Edition. UK Data Service. SN: 


6614, http://doi.org/10.5255/UKDA-SN-6614-16, from wave 10 only (jan 2018 - May 
2020). 


While normal distributions are useful for regression analysis, in order for mental 
health information to be useful at either a micro-level (i.e. primary care physician, 
conducting a screening interview to see if a patient is at risk and requires more 
support) or macro-level (i.e. a government office tasked with tracking changes in risk 
over time), it is often useful to use cut-off scores to group outcomes into categories. 
These categories vary depending on the screening tool and scoring convention used, 
but typically encompass things such as “at risk for depression”, “at risk for anxiety”, 
“major depressive disorder”, “severe psychological distress”, “psychological 
flourishing”, etc. These categories can also be useful in analysis to understand how 
mental health interacts with other aspects of well-being: for example, the share of the 
employed or unemployed who are experiencing anxiety, or the quality of social 
connections for those at risk for depression compared to those who are not (OECD, 


2021(57)). 


One general criticism of the use of thresholds is that they can be arbitrary. However, 
in the case of mental health screeners, thresholds are established through a rigorous 
validation process; researchers use receiver operating characteristic (ROC) analysis to 
determine which cut-off score maximises both the sensitivity and the specificity of the 
measure (See Box 3.3).20 Cut-off scores are also useful in that they convert responses 
to a series of screening tool questions into something comparable to the results of an 
in-depth diagnostic interview: risk for a certain mental health condition. The PHQ, 
GAD, Kessler and CES-D surveys all have standard, validated cut-off scores (Kessler 
et al., 2002[89]; Kroenke et al., 2007[44]; Moriarty, Zack and Kobau, 

2003[21]; Manea, Gilbody and McMillan, 2015[131]; Kroenke et al., 2009[12]; Spitzer 
et al., 2006[11]). The traditional CES-D cut-off score indicative of “depressive case” in 
clinical samples is 16, but this threshold has been known to produce a high rate of 
false positives in non-clinical samples (Eaton, 2004[132]; Santor and Coyne, 
1997[133]). The GAD-7 also has established cut-off scores, but studies have found 
that it performs better at identifying the share of the population at risk for generalised 
anxiety disorder and less well at picking up on other types of anxiety disorders, such 
as social anxiety disorder (Beard and Bjorgvinsson, 2014[134]; Sunderland et al., 
2019[33)). 


Though the GHQ-12 is commonly used to screen general mental health conditions, it 
has been found to generate a high level of false positives; one study found that as 
many as half of those identified as having a mental disorder were false positives 
(positive predictive value of 0.53) (Schmitz, Kruse and Tress, 2001[4]). Other mental 
ill-health screening tools like the MHI-5 or GHQ-12 were not developed with a standard 
validated cut-off to define a case of common mental disorder. Although these scales 
may not have an internationally comparable cut-off score, they have been validated in 
several studies. For instance, (Berwick et al., 1991[38]) validated the MHI-5 as a 
measure for depression using clinical interviews as the gold standard and reported an 
optimal cut-off score of 52.21 Subsequent research has corroborated the finding that 
the MHI-5 performs well as a screener for depression and general mood disorders but 
much less well as a measure for anxiety, somatoform disorders and substance use 
disorders (Rumpf et al., 2001[135]; Strand et al., 2003[7]; Thorsen et al., 2013[130]). 


Some tools have multiple accepted cut-off scores, depending on the intended 
diagnosis, meaning varying scoring conventions can lead to different prevalence 
estimates. Figure 3.9 shows the density plot for PHQ-8 scores, ranging from 0 (least at 
risk for depression) to 25 (most at risk) for 22 European OECD countries. The vertical 
lines show different validated thresholds. A score of 10 or above indicates risk for 
major depressive disorder (shown in black) (Kroenke et al., 2008[136]). Other 
threshold categorisations deem a score of 5-9 as risk for mild depression, 10-14 as 
moderate, 15-19 as risk for depression, and 20+ as risk for severe 

depression (Kroenke, Spitzer and Williams, 2001[137]). Another scoring convention 
(not shown in Figure 3.9), used by Eurostat, is not based purely on the raw score but 
rather defines major depressive symptoms by respondent answers to individual 
questions.22 All three measures lead to different prevalence estimates from the same 
underlying dataset: (1) 6.9% at risk for major depressive disorder; (2) 15.2% at risk for 
mild depression, 2.9% at risk for moderate, 1.7% at risk for moderately severe and 
0.8% at risk for severe; and (3) 3.1% with major depressive symptoms.23 


Although there is no clinical gold standard for psychological well-being, positive 
mental health composite scales have also developed cut-off scores at the request of 
users. Two main approaches have been put forward for (S)WEMWBS: one statistical 
and the other benchmarking.24 In the first, researchers recommend cut-off points at 
+/- one standard deviation, which result in approximately 15% of the population 
having high well-being and 15% having low well-being. In the second approach, cut-off 
scores for (S)WEMWBS are benchmarked against measures capturing symptoms of 
depression and anxiety (see below for a more detailed discussion of positive mental 
health tools being used as screeners for mood disorders). Studies have benchmarked 
WEMWBS against the CES-D, PHQ-9 and GAD-7 to suggest cut-off points on the 
WEMWBS scale that indicate risk for probable clinical depression, possible depression 
or mild depression (and/or anxiety). Taking all of this together, researchers suggested 
that a cut-off point of 60 on the WEMWEBS scale, and of 28 on the SWEMWEBS scale, 
can be used to identify the top 15% of those with high mental well-being, but caution 
that because there is no clinical measure of high mental well-being these thresholds 
are by definition arbitrary (Warwick Medical School, 2021[78]). 


The MHC-SF comprises three subscales (emotional, social and psychological well- 
being), which can be scored to group individuals into three categories: flourishing, 
languishing, and for those in neither of the previous two categories, moderately 
mentally healthy (Lamers et al., 2011[20]). However surveys in Canada and Denmark 
have found that the majority of the population scores highly enough to be categorized 
as flourishing, which runs counter to the theory that flourishing and languishing 
represent a minority of the population and are deviations from the average. This 
suggests that more conservative scoring criteria could be warranted to improve the 
sensitivity of the measure (Santini et al., 2020[48]). 


Figure 3.9. Different scoring conventions can lead to different estimates 
of prevalence 
Density plots showing distribution of risk for depression (PHQ-8), European 
OECD 22, 2014 
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Note: Weighted density plot for PHQ-8 scores in 22 European OECD countries; scores 
range from O (lowest risk) to 25 (highest risk for depression). Vertical lines indicate 
validated cut-off scores as established in the literature: As shown by the bold black 
vertical line, a score >= 10 or above indicates risk for major depressive disorder (Kroenke 
et al., 2008[136]); as shown by the dotted red line vertical lines, a score of 5-9 as risk for 
mild depression, 10-14 as moderate, 15-19 as risk for depression, and 20+ as risk for 
severe depression (Kroenke, Spitzer and Williams, 2001[137]). 

Source: OECD calculations based on European Health Interview Survey (EHIS) wave 2 
data (n.d.[66]) (database), https://ec.europa.eu/eurostat/statistics-explained/index.php? 
title=Glossary:European_health_interview_survey (EHIS). 

Box 3.13. Key messages: Continuous measures vs. cut-off scores 


° Mental health tools that provide a continuous, as opposed to binary, outcome 
variable provide a more nuanced view of population mental health. 
° Evidence suggests that positive mental health composite scales better 


approximate a normal distribution than do measures of psychological distress. 


° Cut-off scores provide researchers and policy makers with clear categories of 
who is at risk and who is not. While cut-off scores for mental ill-health screening tools 
have been validated against clinical diagnoses to maximise their sensitivity and 
specificity, no such gold standard exists for psychological flourishing. 


° Despite best efforts to ensure sensitivity and specificity, cut-off scores may 
provide false positives or be ill-suited for some population groups. 
° Different scoring conventions for the same screening tool can lead to different 


prevalence estimates, therefore care should be taken to ensure consistency. 
Conclusion 


The questions addressed in this chapter are important to consider when thinking 
about which tools are best for measuring population mental health. As with any 
survey data, a number of challenges exist, and care is needed when interpreting 
changes over time and across groups. Perhaps more unique to mental health, stigma 
and discriminatory views can contribute to bias in reported data. Furthermore, it is 
important to integrate the perspective of those with lived experience in survey design 
to ensure the quality and policy relevance of data. However, the evidence reviewed in 
this chapter shows that existing mental health tools provide useful and policy-relevant 
outcomes. Given the increasing urgency of the mental health crisis and the 
prioritisation of action on this front by governments, collecting high-quality mental 
health data following existing good practice is all the more important. On-going 
research into open questions of measurement can then progress in tandem with the 
continual monitoring of population mental health. 


All OECD countries are currently measuring population mental health in one form or 
another and are already making cross-group and longitudinal comparisons. While 
additional research is needed to test the sensitivity of some tools to change over time, 
the high-frequency data collected during the COVID-19 pandemic showed that many 
mental health measures are indeed sensitive to change. Whereas policy discussions 
prior to COVID-19 sometimes emphasised that rates of common mental health 
conditions like generalised anxiety disorder and depressive disorders had remained 
stable in recent years, there is now broad consensus that the pandemic caused a 
dramatic increase in rates of psychological distress over the first two years - and 
these spikes have been captured in the data collected in OECD countries. 


The task ahead is to better harmonise data collection and provide recommendations 
for quality improvement for initiatives already underway. The results of a 2022 OECD 
questionnaire, showcased in Chapter 2, illustrate remaining gaps in the type of mental 
health outcomes collected by countries: an absence of a harmonised approach to 
measure symptoms of anxiety, a lack of standardisation in affective and eudaimonic 
tools, and very uneven use of tools that measure non-depression, non-anxiety types 
of specific mental health conditions. The recommendations for these areas made 
below take into account the practical considerations of data collectors, noting the 
need to keep any new Survey items short. 


Based on a comparative assessment of the statistical quality of different tools, their 
response burden and cost (proxied by item length) as well as information on existing 
data collection practices (Table 3.1), the report recommends the inclusion of 
specific mental health outcome measurement tools for national statistical 
offices to adopt in household, social and health surveys. These 
recommendations do not imply the phasing out of other tools that OECD countries are 
already using to capture population mental health outcomes, particularly with regard 
to previous diagnoses and experienced symptoms, or measures from the 2013 OECD 
Guidelines on Measuring Subjective Well-being (such as life satisfaction). Rather, they 


offer a small set of instruments on which a more internationally harmonised set of 
population mental health outcome indicators could be built: 


° Mental! ill-health -priority recommendation: The Patient Health Questionnaire-4 
(PHQ-4) could be included in more frequent surveys, alongside the regular collection 
of the PHQ-8 or PHQ-9 in health surveys. The PHQ-4 measure combines two 
depression questions from the longer PHQ-9 scale and two anxiety questions from the 
GAD-7 screening tool. It covers both depression and anxiety, rather than focusing on 
only one of these two most common mental health conditions. Furthermore, it does so 
with only four questions, keeping the module relatively short and with a low response 
burden. 81% of OECD countries are already implementing the PHQ-8 or PHQ-9, 
meaning there is trend data to which the depression questions in the PHQ-4 could be 
linked.25 The PHQ-8/9 and the GAD-7 could be retained in specific health surveys, 
while the PHQ-4 could be introduced in general, more frequent surveys, given its 
Shorter length. 


° Positive mental health -recommendation: Either the WHO- 

5 or SWEMWBS could be used to measure affective and eudaimonic aspects of 
positive mental health in a standardised way across countries. These suggestion are 
mainly based on trends in country measurement practice. The WHO-5 is a tool for 
measuring positive affect in that it is relatively short and easy to implement, is 
included in the OECD’s Subjective Well-being Guidelines as an experimental affect 
module (OECD, 2013[1]), has been translated into many languages, and has been 
found to be reliable and valid. Although currently used by only 16% of OECD 
countries, it has been recommended for use by other OECD projects, including as a 
part of a conceptual framework for measuring the non-financial performance of 

firms (Siegerink, Shinwell and Zarnic, 2022[138]) and in an effort to use patient- 
reported indicator surveys (PaRIS) to centre health care delivery on the outcomes that 
matter to patients (de Bienassis et al., 2021[139]). SWEMWBS is a more 
comprehensive tool, in that it covers affective, eudaimonic, and social connections 
aspects of positive mental health. This makes it slightly longer than the WHO-5, 
though only by two questions. SWEMWBS - or the longer 14-question WEMWBS - has 
been adopted by 19% of OECD countries. For countries already active in subjective 
well-being or positive mental health measurement, some of the indicator items within 
SWEMWBS may overlap with existing data collection efforts to measure concepts such 
as life evaluation and the quantity and quality of social connections (see (OECD, 
2020[140]) and (OECD, 2013[1]) for existing OECD recommendations and examples). 
In these instances, the WHO-5 may be more suitable in that it covers only affect. The 
topic of measuring affect and eudaimonia specifically will continue to be explored in 
future OECD workstreams on subjective well-being. 


° General mental health status - recommendation: A single question about a 
respondent’s general mental health status could be included in a range of 
different surveys across a country’s entire data infrastructure system. Single general 
mental health questions have less of an evidence base compared to established 
screening tools, but the findings that do exist suggest it is a useful and meaningful 
measure. Many OECD countries already collect data on self-reported physical health, 
thus in question framing it will be important to distinguish between self- 

reported physical vs. self-reported menta/ health. Some OECD countries currently 
collect a general self-reported health measure that captures both physical and mental 
health; we recommend separating these measures out. In order for this to happen in 
an internationally comparable way, more research and coordination must happen to 
align existing country efforts. Canada has been an early adopter of this approach, and 
its framing as a self-reported mental health (GRMH) question-and-answer option has 
already been adopted by Chile and Germany; furthermore, much of the existing 
evidence-base on these types of question has been produced in Canada. Other 
countries interested in adding such an item to surveys may be interested in using this 


framing as well: “In general, how is your mental health? Excellent / Very good / Good / 
Fair / Poor.” 


Currently, very few countries are using tools to collect information on mental health 
conditions beyond depression and anxiety, such as substance use disorders, PTSD, 
obsessive compulsive disorder, eating disorders, bipolar disorder, etc. There are 
exceptions - these outcomes are covered by all countries that use structured 
interviews (see Table 2.4), and France and Slovenia, among a few others, have 
implemented detailed survey modules with tools that capture diagnoses and 
symptoms of these conditions. There is value in measuring these concepts as distinct 
conditions, rather than as a part of general mental ill-health, therefore the statistical 
agenda moving forward could focus on developing recommendations in this space. 


As a general point to note, it is more informative, and thus a better use of limited 
resources, to diversify across tool types and mental health outcome measures, rather 
than to implement a variety of iterations of the same type of tool. For example, rather 
than implementing a range of different screening tools to capture depression/anxiety 
across a country’s survey infrastructure, it would be of greater use to 

harmonise within tool areas. This might mean choosing a single depression/anxiety 
screening tool, then supplement it with single-item question tools to capture received 
diagnoses, experience of symptoms and so on. 


Above all, this report has highlighted the importance of precision when 
communicating outcome measures. Each tool measures a specific, slightly different 
facet of population mental health. Furthermore, individual tools can be scored in a 
variety of ways, each leading to different estimates for mental health outcomes. This 
speaks to the need for greater harmonisation, but also of clearer communication in 
terms of stating what is meant by mental health and how it is measured. This is all the 
more important given the rise of mental health to the top of national agendas in the 
years following the pandemic. 
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Notes 


«< 1. Some screening tools contain questions relating to experienced symptoms. 
However, we differentiate the category of “screening tools” from that of “experienced 
symptoms” in that the former are set piece instruments, validated against clinical 
diagnoses for mental health conditions, while the former are general, non- 
standardised question formulations asking respondents whether, for example, they 
“currently experience symptoms of PTSD” or “suffer from chronic anxiety” (see Table 
2.7). Refer to Chapter 2 for an extended discussion on different instrument types. 


« 2. This chapter covers only four composite scales of positive mental health: the SF- 
12, the WHO-5, MHC-SF and (S)WEMWBS. Measurement guidelines for life evaluation, 
affect and eudaimonic aspects of positive mental health and subjective well-being are 
covered in depth in (OECD, 2013[1]). 


« 3. By construction, screening tools with fewer items will have lower values of 
Cronbach's alpha. (Recall that Cronbach's alpha is a function of, among other things, 
the total number of items in a scale.) This again underscores the importance of 
weighting all facets of statistical quality together, rather than placing high importance 
on any single test. 


< 4. The two other anxiety scales against which the GAD-7 and GAD-2 were tested for 
convergent validity were the Beck Anxiety Inventory (BAI) and the anxiety subscale of 
the Symptom Checklist-90 (SCL-90). 


< 5. Rasch analysis uses psychometric models to analyse categorical data and 
identify and measure latent attitudes or characteristics. 


<= 6. Although stigma and low levels of mental health literacy are strong drivers of 
non-response rates for mental health survey items, other factors - such as low levels 
of institutional trust, lack of motivation or sufficient time to participate, language 
barriers, poor health - may also contribute to low response rates (Lowthian and Lloyd, 
2020[162]). 


« 7. Strong confidentiality assurances can reduce non-response rates for sensitive 
subjects (Singer, Von Thurn and Miller, 1995[58]); however, they can in 

fact increase non-response rates for non-sensitive topics, as respondents are primed 
to then expect threatening or sensitive questions following an in-depth data 
confidentiality explanation and can be put off the interview (Singer, Hippler and 
Schwarz, 1992[156]). 


« 8. The correlation between risk for depressive disorders and stigma as measured 
through anti-stigma indicators - the share who agree that seeking treatment for 
mental disorders is a sign of strength, and the share who agree that mental illness is 
an illness like any other - show the reverse relationship, with prevalence lower in 
places with less bias. However, these correlations are not significant. 


« 9. Cross-country and cross-group comparability are not trivial measurement issues, 
and some previous OECD work has dealt with the challenge of cross-country 
comparisons by assigning the bottom quintile of the population as at risk for mental 
distress, based on evidence from epidemiological studies stating that 20% of the 
population experiences some form of mental disorder in a given 12-month 

period (OECD, 2021[57]). However, this approach by definition assigns constant 
prevalence, which especially in the aftermath of the COVID-19 pandemic - which saw 
governments across the OECD struggling to deal with huge spikes in population 
mental distress, depression, anxiety and stress - is limiting. 


< 10. The geographic range where the WHO-5 has been used in surveys 
encompasses: Africa (Algeria, South Africa), Asia (Bangladesh, China, India, Japan, 
South Korea, Sri Lanka, Taiwan, Thailand), Europe (Northern, Southern, Eastern, 
Western and Central Europe), the Americas (Canada, the United States, Brazil, 
Mexico), the Middle East (Israel, Iran, Lebanon) and Oceania (Australia, New 
Zealand) (Topp et al., 2015[49]). 


<« 11. These emerging methods rely heavily on the application of modern 
psychometric methods, such as item response theory (IRT), to improve the validity, 
accuracy, comparability and efficiency of mental health scales, which have in turn 
shown substantial promise in the advanced analysis of cross-cultural differences. 
Using IRT-based differential item functioning as well as the use of item anchoring or 
equating, new methods are able to adjust for any significant bias (Dere et al., 
2015[149]; Gibbons and Skevington, 2018[141]; Vaughn-Coaxum, Mair and Weisz, 
2016[155]). Similarly, new IRT models have emerged that can estimate and correct 
for extreme response styles more effectively than classical methods and quantify the 


tendency of extreme responding on a particular scale (Dowling et al., 2016[150]; Jin 
and Wang, 2014[152]). Some of these new methods include item banking, adapting 
testing and data-driven short scales and scale equating. 


<= 12. While most international research has confirmed this rising trend of mental ill- 
health (OECD, 2021[98]; Santomauro et al., 2021[161]), evidence from individual 
countries at times show slightly different trajectories of mental health outcomes. A 
German study found that the prevalence of depression fell in the first year of the 
pandemic, but began rising by October 2020 and subsequently increased further over 
the course of 2021 and 2022 (Hapke et al., 2022[158]; Mauz et al., 2022[159]). An 
epidemiological study in Norway found that the prevalence of mental disorders 
decreased slightly in the early days of the pandemic (May 2020), before returning to 
pre-pandemic levels by September 2020 - suggesting relatively stable levels of 
mental disorders (Knudsen et al., 2021[160]). This mirrors findings from a meta- 
analysis of 65 studies from early 2020 which showed only a small average increase in 
mental health symptoms in March and April 2020 that had abated by July. Both 
studies concluded by early Q3 2020, leaving open the possibility that an extension of 
the research might unveil findings similar to that of Germany - little to no change in 
the early days of the pandemic, but rises in distress by late 2020 and 2021. 


<- 13. These patterns also exist for physical health outcomes. A joint United States and 
Canada study found that self-administered respondents were more likely to report 
lower health-related quality-of-life (HRQoL) outcomes than did interviewer- 
administered telephone survey respondents (Hanmer, Hays and Fryback, 2007[142]); 
another study in Spain found that respondents reported better physical health 
outcomes, measured by the SF-36, when surveys were administered by 

interviewers (Garcia et al., 2005[143)]). 


<= 14. CCHS surveys include both computer-assisted personal interviews (CAPI) and 
computer-assisted telephone interviews (CATI). Between 2001 and 2003, the survey 
changed the ratio of CATI to CAPI interviews, which allowed researchers to study how 
mode effects affected the comparability of CCHS data across rounds. They found 
differences in health indicator outcomes by mode: those interviewed in person 
reported higher obesity rates and were more likely to be inactive, to smoke and to 
report contacts with medical doctors. However, self-reported mental health showed no 
mode effects (St-Pierre and Béland, 2004[117]). 


= 15. In 2020 and 2021, many countries that still use face-to-face data collection 
switched to telephone surveys due to the COVID-19 pandemic. These mode shifts 
have not been included in the figure, as mental health outcomes in these years would 
be heavily influenced by the global pandemic. 


<= 16. However, it is impossible to disentangle mode effects from the socio-political 
events that may have necessitated Gallup to change modes in the first place, which 
would be expected to exhibit an influence on underlying mental health. Many of the 
mode switches highlighted in this figure take place in countries that experienced 
Significant political disruptions, or incidents of violence, that likely informed Gallup’s 
choice to change the mode of data collection in the first place. For example, mode 
switches in Turkiye coincide with the 2016 attempted coup; the mode switch in 2013 
in Iraq coincides with a ramping up of ISIS activity in the region; the mode switch in 
Libya coincides with the start of the second civil war; and so on. All of these events 
have a real impact on population negative affect balance and would very likely drive 
some of the changes shown in the figure, independently of mode effects. 


< 17. For example, while mental disorders are still largely perceived as shameful in 
Mexico, the Mexican National Comorbidity Survey interviewers experienced few 
refusal rates and over the course of speaking with respondents found that people 


were willing to open up about their mental health, often for the first time 
ever (Medina-Mora et al., 2008[153]). 


<= 18. Floor effects occur when there is bunching at the lower end of the scale, 
whereas ceiling effects occur when there is bunching at the upper end of the scale. 
In Figure 3.9, the GHQ-12 shows floor effects in that most respondents fall at the 
lower end of the scale, which indicates they are not at any significant risk for mental 
distress. Because the scale focuses on those experiencing distress, it may then be 
less sensitive at distinguishing between individuals with higher levels of underlying 
positive mental health. 


<« 19. However, some studies suggest a ceiling effect is present for (S)\WEMWBS. 


<= 20. Despite the rigor of the clinical validation process, criticisms of threshold scores 
remain. The cut-off scores that optimise sensitivity and specificity can differ - at times 
considerably - across population groups, and as a result alternatives to the use of cut- 
offs have been proposed (Goldberg, Oldehinkel and Ormel, 1998[151]). One such 
proposal is the application of stratum-specific likelihood ratios, rather than fixed 
thresholds, so as to allow for more detailed classification systems (Furukawa et al., 
2001[144]; Furukawa and Goldberg, 1999[145]). Additionally, new findings from 
research into self-reported symptoms have found that the use of single sum-scores 
and clinical cut-offs to estimate risk for major depression may conceal important 
clinical insights into depression research (Fried, 2017[146]). To overcome these 
issues, Some researchers have recommended the use of multiple depression scales to 
generate robust and generalisable conclusions, or the use of scales that include 
important non-DSM symptoms (e.g. the Symptoms of Depression 

Questionnaire (Pedrelli et al., 2013[147])). While it is useful to note these nuances, for 
a government agency measuring population mental health at a macro level - as 
opposed to a healthcare professional at a clinical level - there is little to suggest that 
the use of threshold scores is inappropriate or uninformative. 


<= 21. Other studies have used cut-off scores ranging from 54 to 76 (Thorsen et al., 
2013[130]; Hoeymans et al., 2004[154)). 


<= 22. Arespondent is deemed to be at risk for major depressive disorder if they 
answer “more than half the days” to either of the first two questions on the PHQ-8, 
and, in addition, a total of five or more of the eight items are reported as “more than 
half the days” (Eurostat, n.d.[74]). 


«= 23. Lack of consistency in cut-off score usage can lead to confusion. For example, in 
2019 three entities in Los Angeles provided wildly different estimates for the 
prevalence of mental health conditions among the homeless population. The Los 
Angeles Times, the Los Angeles Homeless Services Authority and the California Policy 
Lab at the University of California Los Angeles made estimates of 67%, 29% and 78%, 
respectively. All these came from the same dataset, with differences stemming from 
statistical interpretation (Smith and Oreskes, 2019[157]). 


<= 24. Other researchers have suggested fixed cut-off points on the SWEMWBS scale: 
low mental well-being (having a score between 7.00 and 19.98), moderate (19.99 to 
29.30) and high (29.31 to 35.00). These categories are derived from previous work on 
the Danish population, with low mental well-being corresponding to the bottom 

15" percentile of the distribution, and high mental well-being the top 

15" percentile (Santini et al., 2022[148]). Fixed cut-off scores have not been 
developed for the full-length WEMWBS. 


<= 25. Care should be taken when comparing statistics on risk for major depressive 
disorder, or risk for depressive symptoms, coming from the PHQ-4 vs. PHQ-8 or -9. 
There are a number of scoring conventions for the PHQ that can lead to different 


prevalence estimates. Directly comparable estimates can be created by calculating 
risk for depression from the two individual indicators that appear in both the PHQ-4 
and the PHQ-8. In this way, measures between general social and health surveys can 
be fully aligned, even if other (historical) health reporting has used the full set of PHQ- 
8 indicators to estimate depression risk prevalence. 


